Lightweight Semantic Segmentation by Combining Multi-Link Feature Codec with Wavelet Pooling

YI Qingming; WANG Yu; SHI Min; LUO Aiwen

doi:10.12178/1001-0548.2023124

Article Contents

Article Navigation > Journal of University of Electronic Science and Technology of China > 2024 > Accepted Manuscript

YI Qingming, WANG Yu, SHI Min, LUO Aiwen. Lightweight Semantic Segmentation by Combining Multi-Link Feature Codec with Wavelet Pooling[J]. Journal of University of Electronic Science and Technology of China. doi: 10.12178/1001-0548.2023124

Citation:

YI Qingming, WANG Yu, SHI Min, LUO Aiwen. Lightweight Semantic Segmentation by Combining Multi-Link Feature Codec with Wavelet Pooling[J]. Journal of University of Electronic Science and Technology of China. doi: 10.12178/1001-0548.2023124

Lightweight Semantic Segmentation by Combining Multi-Link Feature Codec with Wavelet Pooling

doi: 10.12178/1001-0548.2023124

YI Qingming^{1, 2},
WANG Yu¹,
SHI Min¹,
LUO Aiwen^{1
,
,}

1.
School of Information Science and Technology, Jinan University, Guangzhou 510632, China
2.
Taidou Microelectronic Science and Technology Co., Ltd., Guangzhou 510663, China

Received Date: 2023-04-25
Rev Recd Date: 2023-11-11

Available Online: 2024-04-30

Abstract

Semantic segmentation is currently one of the basic technologies in the field of scene understanding. Existing semantic segmentation networks usually result in complex structures, a large number of parameters, excessive loss of image feature information, and low computational efficiency. To address these problems, this work proposes a lightweight semantic segmentation network named MLWP-Net (Multi-Link Wavelet-Pooled Network) which combines features with multiple connections and wavelet pooling based on the encoder-decoder framework and discrete wavelet transform (DWT). In the encoding phase, a lightweight feature extraction bottleneck was designed by combining with the depthwise separable convolution, dilated convolution, and channel compression, using a multi-link strategy to fuse multi-level features; besides, a low-frequency-mixed wavelet pooling operation was employed to replace the traditional downsampling operation for effectively reducing the information loss during the encoding process. In the decoding stage, a multi-branch parallel dilated convolutional decoder is designed to fuse multiple features linked to the different layers in the encoder to recover the image resolution in parallel. The experimental results show that our MLWP-Net achieves 74.1% and 68.2% mIoU segmentation accuracy on the datasets of Cityscapes and Camvid with only 0.74M parameters, which demonstrates its effectiveness for semantic segmentation.
- real-time semantic segmentation,
- lightweight neural network,
- multi-link feature fusion,
- wavelet pooling,
- multi-branch dilated convolution

References

[1]	PENG B. Research on operation stability evaluation of industrial automation system based on improved deep learning[J]. International Journal of Manufacturing Technology and Management, 2022, 36(2/3/4): 141. doi: 10.1504/IJMTM.2022.123660
[2]	孔令军, 王茜雯, 包云超, 等. 基于深度学习的医疗图像分割综述[J]. 无线电通信技术, 2021, 47(2): 121-130. doi: 10.3969/j.issn.1003-3114.2021.02.001 KONG L J, WANG Q W, BAO Y C, et al. A survey on medical image segmentation based on deep learning[J]. Radio Communications Technology, 2021, 47(2): 121-130. doi: 10.3969/j.issn.1003-3114.2021.02.001
[3]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(12): 2481-2495. doi: 10.1109/TPAMI.2016.2644615
[4]	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//European Conference on Computer Vision. Cham: Springer, 2018: 833-851.
[5]	LIN G S, MILAN A, SHEN C H, et al. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 1925-1934.
[6]	PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL]. [2023-05-21]. https://arxiv.org/abs/1606.02147.
[7]	ROMERA E, ALVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(1): 263-272. doi: 10.1109/TITS.2017.2750080
[8]	WANG Y, ZHOU Q, LIU J, et al. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Image Processing. New York: IEEE, 2019: 1860-1864.
[9]	WU T, TANG S, ZHANG R, et al. CGNet: A light-weight context guided network for semantic segmentation[J]. IEEE Trans Image Process, 2021, 30: 1169-1179. doi: 10.1109/TIP.2020.3042065
[10]	LI G, YUN I, KIM J, et al. DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation[EB/OL]. [2023-05-22]. https://arxiv.org/pdf/1907.11357.pdf.
[11]	LU M X, CHEN Z X, JONATHAN WU Q M, et al. FRNet: Factorized and regular blocks network for semantic segmentation in road scene[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(4): 3522-3530. doi: 10.1109/TITS.2020.3037727
[12]	CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 1251-1258.
[13]	HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. [2023-05-22]. https://arxiv.org/pdf/1704.04861.pdf.
[14]	ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 6848-6856.
[15]	马宇, 张丽果, 杜慧敏, 等. 卷积神经网络的交通标志语义分割[J]. 计算机科学与探索, 2021, 15(6): 1114-1121. MA Y, ZHANG L G, DU H M, et al. Traffic sign semantic segmentation based on convolutional neural network[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1114-1121.
[16]	SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity: The all convolutional net[EB/OL]. [2023-05-21]. https://arxiv.org/pdf/1412.6806.pdf.
[17]	JAMALI A. Comparing the performance and application of wavelet transform in digital image processing segmentation[EB/OL]. [2023-05-22]. http://dx.doi.org/10.2139/ssrn.4554509.
[18]	RAMAMONJISOA M, FIRMAN M, WATSON J, et al. Single image depth prediction with wavelet decomposition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2021: 11089-11098.
[19]	LIU P J, ZHANG H Z, ZHANG K, et al. Multi-level wavelet-CNN for image restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE, 2018: 773-782.
[20]	XUE S K, QIU W Y, LIU F, et al. Wavelet-based residual attention network for image super-resolution[J]. Neurocomputing, 2020, 382: 116-126. doi: 10.1016/j.neucom.2019.11.044
[21]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 770-778.
[22]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 2117-2125.
[23]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/TPAMI.2017.2699184
[24]	CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 3213-3223.
[25]	BROSTOW G J, FAUQUEUR J, CIPOLLA R. Semantic object classes in video: A high-definition ground truth database[J]. Pattern Recognition Letters, 2009, 30(2): 88-97. doi: 10.1016/j.patrec.2008.04.005
[26]	WANG Y, ZHOU Q, XIONG J, et al. ESNet: an efficient symmetric network for real-time semantic segmentation[M]//Pattern Recognition and Computer Vision. Cham: Springer International Publishing, 2019.
[27]	TREML M, ARJONA-MEDINA J A, UNTERTHINER T, et al. Speeding up semantic segmentation for autonomous driving[EB/OL] . [2023-05-21]. https://www.researchgate.net/publication/309935608_Speeding_up_Semantic_Segmentation_for_Autonomous_Driving.
[28]	YU C Q, GAO C X, WANG J B, et al. BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068. doi: 10.1007/s11263-021-01515-2
[29]	LO S Y, HANG H M, CHAN S W, et al. Efficient dense modules of asymmetric convolution for real-time semantic segmentation[C]//Proceedings of the Proceedings of the 1st ACM International Conference on Multimedia in Asia. New York: ACM, 2019: 1-6.
[30]	MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation[M]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018.
[31]	POUDEL R P K, BONDE U, LIWICKI S, et al. Contextnet: Exploring context and detail for semantic segmentation in real-time[EB/OL]. [2023-05-21]. https://arvix.org/abs/1805.04554.
[32]	POUDEL R P K, LIWICKI S, CIPOLLA R. Fast-scnn: Fast semantic segmentation network[EB/OL]. [2023-05-21]. https://arvix.org/abs/1902.04502.
[33]	LI H C, XIONG P F, FAN H Q, et al. DFANet: Deep feature aggregation for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 9522-9531.
[34]	JIANG W H, XIE Z Z, LI Y Y, et al. LRNNET: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. New York: IEEE, 2020: 1-6.
[35]	ZHOU Q, WANG Y, FAN Y W, et al. AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network[J]. Applied Soft Computing, 2020, 96: 106682. doi: 10.1016/j.asoc.2020.106682
[36]	GE R J, HE Y T, XIA C, et al. DDPNet: A novel dual-domain parallel network for low-dose CT reconstruction[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2022: 748-757.
[37]	ZHANG X L, DU B C, WU Z Y, et al. LAANet: Lightweight attention-guided asymmetric network for real-time semantic segmentation[J]. Neural Computing and Applications, 2022, 34(5): 3573-3587. doi: 10.1007/s00521-022-06932-z
[38]	XIONG J J, PO L M, YU W Y, et al. CSRNet: Cascaded Selective Resolution Network for real-time semantic segmentation[J]. Expert Systems with Applications, 2023, 211: 118537. doi: 10.1016/j.eswa.2022.118537
[39]	XU G A, LI J C, GAO G W, et al. Lightweight real-time semantic segmentation network with efficient transformer and CNN[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 15897-15906. doi: 10.1109/TITS.2023.3248089

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(7)

Get Citation

PDF

XML

Article Metrics

Article views(688) PDF downloads(13) Cited by()

Proportional views

HTML

语义分割技术作为计算机视觉的一部分，目的是为图像中的每个像素分配类别标签，被广泛应用于工业自动化^[1]、医疗图像^[2]等场景解析领域。尤其在面向自动驾驶的城市交通场景中，高效的语义分割模型可以对道路做出实时场景解析，为路径规划、避让行人障碍等提供有效的辅助信息。然而真实应用场景中往往要求语义分割网络同时具有较高的分割精度和较快的计算速度，这对语义分割的准确性和实时性均提出较高的要求，因此亟需研究出一种能够在分割精度和计算成本之间实现较好权衡的语义分割算法。

现有的提高语义分割准确度的策略大多是加大网络的深度，以期获得更加丰富的图像特征信息。目前分割效果较好的语义分割网络，如SegNet^[3]、DeepLabV3+^[4]、RefineNet^[5]等都有较高的准确率。但这些网络算法具有较大的模型参数量和较高的计算复杂度，进而影响分割效率。为了将语义分割技术实现落地应用并获得实时处理图像信息的效果，轻量级神经网络设计成为实时语义分割任务的一个重要研究目标。

现有的轻量级网络如ENet^[6]、ERFNet^[7]、LEDNet^[8]、CGNet^[9]、DABNet^[10]、FRNet^[11]等的参数量都已经控制在1 MB以下。其中，ENet和SegNet是两大经典的轻量化模型，通过采用非对称的编解码结构和通道裁剪策略，SegNet的参数量仅为0.36M，而ERFNet利用非瓶颈残差结构并将标准卷积替换为非对称卷积，降低参数量的同时获得很好的分割精度。而Xception^[12]使用深度可分离卷积替代标准卷积，增加网络深度的同时还减少了参数量。在Xception的基础上，MobileNet^[13]引入深度可分离卷积和残差模块来实现模型的压缩和推理的加速，减少卷积操作带来的参数量和计算量的同时保持较好的分割性能。相较之下，ShuffleNet^[14]运用通道混洗的策略，通过转置、分组卷积、通道乱序的方法来促进信息流动，精简模块的同时提高计算效率。尽管以上网络在参数量方面较小，并保证一定的分割精度，但仍然难以满足真实场景中的应用需求^[15]。

此外，为了降低特征的维度并保留有效信息，现有的大多数语义分割网络均采用下采样池化操作，如最大池化、平均池化、随机池化等。但池化操作往往会使得图像分辨率下降，导致图像特征信息丢失。尽管已有研究者对池化操作的特征信息丢失问题进行改进，如采用带步长的卷积替代池化操作或采用低通滤波去除高频特征之后再进行下采样操作^[16]，但此类操作或增加计算量，或影响网络的特征表达能力。而离散小波变换（DWT）以其强大的时频分析能力，被广泛应用于信号与图像处理领域^[17]。随着深度学习的不断发展，越来越多的研究也将其应用于卷积神经网络（CNN）的优化中。如，将DWT应用到编-解码器中，降低参数量的同时提高网络的运算速度^[18]；或将其结合残差网络，利用小波变换提高图像的恢复能力^[19]；或者，将其与注意力机制结合，加强对不同频率分量的特征注意力^[20]。然而，现有的小波变换与CNN的组合方法并未充分发挥其多通道分频的优势，仍然具有较大的改进空间。

综合以上分析，本文提出了一个联合多连接特征编解码与小波池化的轻量级语义分割网络，简称称MLWP-Net（Multi-Link Wavelet-Pooled Network），包括：轻量化的逐步特征融合模块（Progressive Feature Fusion, PFF）；基于小波变换理论的低频混合小波池化操作（Low-frequency-mixed Wavelet Pooling, LWP），用于实现高效的下采样操作；以及多分支并行空洞卷积解码器（Multi-branch Parallel Dilated Convolutional Decoder, MPDCD）。经大量实验验证，MLWP-Net具有计算复杂度低且分割精度高的优点。

3. 结束语

本文提出了一种渐进式特征融合与低频混合小波池化结合的轻量化语义分割网络MLWP-Net，解决了现有语义分割网络中存在的特征信息提取不足和网络参数量较大等问题。一方面，在编码器端主要设计了轻量化的多连接逐步特征融合PFF模块和通用型的低频混合小波池化LWP操作，应用前者实现了上下文信息的有效聚合，从而高效地提取图像特征；应用后者解决了现有网络中下采样操作导致的特征信息丢失问题，高效地完成下采样操作，并可插入其他分割网络中作下采样操作。另一方面，提出了多分支空洞卷积特征融合MPDCD解码器，有效结合多尺度上下文特征实现图像空间信息的高效恢复。

与现存流行的实时语义分割网络对比，MLWP-Net在保证高精度的前提下，大幅度减少了模型参数量，对移动终端领域有很好的应用前景，尤其适用于对准确性和时效性要求较高的自动驾驶中的道路场景分割任务中。

Reference (39)

[1]	PENG B. Research on operation stability evaluation of industrial automation system based on improved deep learning[J]. International Journal of Manufacturing Technology and Management, 2022, 36(2/3/4): 141.
[2]	孔令军, 王茜雯, 包云超, 等. 基于深度学习的医疗图像分割综述[J]. 无线电通信技术, 2021, 47(2): 121-130.	KONG L J, WANG Q W, BAO Y C, et al. A survey on medical image segmentation based on deep learning[J]. Radio Communications Technology, 2021, 47(2): 121-130.
[3]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(12): 2481-2495.
[4]	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//European Conference on Computer Vision. Cham: Springer, 2018: 833-851.
[5]	LIN G S, MILAN A, SHEN C H, et al. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 1925-1934.
[6]	PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL]. [2023-05-21]. https://arxiv.org/abs/1606.02147.
[7]	ROMERA E, ALVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(1): 263-272.
[8]	WANG Y, ZHOU Q, LIU J, et al. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Image Processing. New York: IEEE, 2019: 1860-1864.
[9]	WU T, TANG S, ZHANG R, et al. CGNet: A light-weight context guided network for semantic segmentation[J]. IEEE Trans Image Process, 2021, 30: 1169-1179.
[10]	LI G, YUN I, KIM J, et al. DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation[EB/OL]. [2023-05-22]. https://arxiv.org/pdf/1907.11357.pdf.
[11]	LU M X, CHEN Z X, JONATHAN WU Q M, et al. FRNet: Factorized and regular blocks network for semantic segmentation in road scene[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(4): 3522-3530.
[12]	CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 1251-1258.
[13]	HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. [2023-05-22]. https://arxiv.org/pdf/1704.04861.pdf.
[14]	ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 6848-6856.
[15]	马宇, 张丽果, 杜慧敏, 等. 卷积神经网络的交通标志语义分割[J]. 计算机科学与探索, 2021, 15(6): 1114-1121.	MA Y, ZHANG L G, DU H M, et al. Traffic sign semantic segmentation based on convolutional neural network[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1114-1121.
[16]	SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity: The all convolutional net[EB/OL]. [2023-05-21]. https://arxiv.org/pdf/1412.6806.pdf.
[17]	JAMALI A. Comparing the performance and application of wavelet transform in digital image processing segmentation[EB/OL]. [2023-05-22]. http://dx.doi.org/10.2139/ssrn.4554509.
[18]	RAMAMONJISOA M, FIRMAN M, WATSON J, et al. Single image depth prediction with wavelet decomposition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2021: 11089-11098.
[19]	LIU P J, ZHANG H Z, ZHANG K, et al. Multi-level wavelet-CNN for image restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE, 2018: 773-782.
[20]	XUE S K, QIU W Y, LIU F, et al. Wavelet-based residual attention network for image super-resolution[J]. Neurocomputing, 2020, 382: 116-126.
[21]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 770-778.
[22]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 2117-2125.
[23]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848.
[24]	CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 3213-3223.
[25]	BROSTOW G J, FAUQUEUR J, CIPOLLA R. Semantic object classes in video: A high-definition ground truth database[J]. Pattern Recognition Letters, 2009, 30(2): 88-97.
[26]	WANG Y, ZHOU Q, XIONG J, et al. ESNet: an efficient symmetric network for real-time semantic segmentation[M]//Pattern Recognition and Computer Vision. Cham: Springer International Publishing, 2019.
[27]	TREML M, ARJONA-MEDINA J A, UNTERTHINER T, et al. Speeding up semantic segmentation for autonomous driving[EB/OL] . [2023-05-21]. https://www.researchgate.net/publication/309935608_Speeding_up_Semantic_Segmentation_for_Autonomous_Driving.
[28]	YU C Q, GAO C X, WANG J B, et al. BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068.
[29]	LO S Y, HANG H M, CHAN S W, et al. Efficient dense modules of asymmetric convolution for real-time semantic segmentation[C]//Proceedings of the Proceedings of the 1st ACM International Conference on Multimedia in Asia. New York: ACM, 2019: 1-6.
[30]	MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation[M]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018.
[31]	POUDEL R P K, BONDE U, LIWICKI S, et al. Contextnet: Exploring context and detail for semantic segmentation in real-time[EB/OL]. [2023-05-21]. https://arvix.org/abs/1805.04554.
[32]	POUDEL R P K, LIWICKI S, CIPOLLA R. Fast-scnn: Fast semantic segmentation network[EB/OL]. [2023-05-21]. https://arvix.org/abs/1902.04502.
[33]	LI H C, XIONG P F, FAN H Q, et al. DFANet: Deep feature aggregation for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 9522-9531.
[34]	JIANG W H, XIE Z Z, LI Y Y, et al. LRNNET: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. New York: IEEE, 2020: 1-6.
[35]	ZHOU Q, WANG Y, FAN Y W, et al. AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network[J]. Applied Soft Computing, 2020, 96: 106682.
[36]	GE R J, HE Y T, XIA C, et al. DDPNet: A novel dual-domain parallel network for low-dose CT reconstruction[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2022: 748-757.
[37]	ZHANG X L, DU B C, WU Z Y, et al. LAANet: Lightweight attention-guided asymmetric network for real-time semantic segmentation[J]. Neural Computing and Applications, 2022, 34(5): 3573-3587.
[38]	XIONG J J, PO L M, YU W Y, et al. CSRNet: Cascaded Selective Resolution Network for real-time semantic segmentation[J]. Expert Systems with Applications, 2023, 211: 118537.
[39]	XU G A, LI J C, GAO G W, et al. Lightweight real-time semantic segmentation network with efficient transformer and CNN[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 15897-15906.

Bottleneck	mIoU/%	GFLOPs	Params/MB
ResNet^[21]	60.4	38.4	0.57
Non-bt-1D^[7]	71.8	35.6	0.99
本文PFF	73.5	18.1	0.74

Network	LWP	mIoU/%	GFLOPs	Params/MB
ERFNet^[7]	-	68.0	35.4	2.06
ERFNet^[7]	√	72.8	35.9	2.04
DABNet^[10]	-	70.1	13.7	0.76
DABNet^[10]	√	70.6	14.4	0.65
ESNet^[26]	-	70.7	32.1	1.66
ESNet^[26]	√	72.0	32.7	1.63
MLWP-Net	-	72.3	17.6	0.84
MLWP-Net	√	73.5	18.1	0.74

Module	MPDCD	mIoU/%	GFLOPs	Params/MB
CGNet^[9]	-	64.8	9.24	0.49
CGNet^[9]	√	70.1	15.51	0.51
FRNet^[11]	-	70.4	16.97	1.01
FRNet^[11]	√	71.1	23.27	1.03
MLWP-Net	-	72.1	13.6	0.73
MLWP-Net	√	73.5	18.1	0.74

Method	正交性	对称性	紧支性	Speed(fps)	mIoU/%
Haar	有	对称	有	95	68.2
db1	有	近似对称	有	94.8	67.8
rbio1.1	无	对称	无	95.3	67.4
bior1.1	无	不对称	有	92.1	67.6

Method	Pretrain	Input Size	mIoU/%	Params/MB	Speed/fps
SegNet^[3]	ImageNet	360×640	56.1	29.5	38.2
RefineNet^[5]	ImageNet	512×1024	73.6	118.1	9.1
SQNet^[27]	ImageNet	512×1024	59.8	16.3	25.7
BiseNetV2^[28]	No	512×1024	73.6	6.2	51.0
ENet^[6]	No	512×1024	58.3	0.36	27.4
ERFNet^[7]	No	512×1024	68.0	2.1	41.9
LEDNet^[8]	No	512×1024	69.2	0.95	59.6
CGNet^[9]	No	512×1024	64.8	0.49	65.6
DABNet^[10]	No	512×1024	70.1	0.76	102
FRNet^[11]	No	512×1024	70.4	1.01	127
ESNet^[26]	No	512×1024	70.7	1.66	63.0
EDANet^[29]	No	512×1024	67.3	0.68	105.5
ESPNet^[30]	No	512×1024	60.3	0.36	146.0
ContextNet^[31]	No	1024×2048	66.1	0.85	57.7
Fast-SCNN^[32]	No	1024×2048	68.0	1.1	67.1
DFANet^[33]	ImageNet	1024×1024	71.3	7.8	100
LRNNet^[34]	No	512×1024	72.2	0.68	71
AGLNet^[35]	No	512×1024	71.3	1.12	52
DDPNet^[36]	No	768×1536	74.0	2.52	85.4
CSRNet-light^[38]	ResNet18	512×1024	74.0	——	56
LETNet^[39]	No	512×1024	72.8	0.95	150
MLWP-Net (ours)	No	512×1024	74.1	0.74	85.6

Lightweight Semantic Segmentation by Combining Multi-Link Feature Codec with Wavelet Pooling

doi: 10.12178/1001-0548.2023124

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views