联合多连接特征编解码与小波池化的轻量级语义分割

易清明; 王渝; 石敏; 骆爱文

doi:10.12178/1001-0548.2023124

联合多连接特征编解码与小波池化的轻量级语义分割

doi: 10.12178/1001-0548.2023124

易清明^{1, 2},
王渝¹,
石敏¹,
骆爱文^1, ,

1.
暨南大学信息科学技术学院，广州 510632
2.
泰斗微电子科技有限公司，广州 510663

基金项目: 国家自然科学基金（62002134）；广东省基础与应用基础研究基金（2020A1515110645，2023A1515010834）；广东省普通高校新型半导体与器件重点实验室（2021KSY001）；羊城创新创业领军人才支持计划资助（2019019）；广东省科技创新战略专项资金（大学生科技创新培育）（pdjh2023b0061）。

详细信息

作者简介:
易清明，博士，教授，主要从事多媒体信息处理方面的研究

通讯作者: 通信作者E-mail: luoaiwen@jnu.edu.cn

中图分类号: TP391

Lightweight Semantic Segmentation by Combining Multi-Link Feature Codec with Wavelet Pooling

YI Qingming^{1, 2},
WANG Yu¹,
SHI Min¹,
LUO Aiwen^{1
, ,}

1.
School of Information Science and Technology, Jinan University, Guangzhou 510632, China
2.
Taidou Microelectronic Science and Technology Co., Ltd., Guangzhou 510663, China

摘要: 语义分割是当前场景理解领域的基础技术之一。现存的语义分割网络通常结构复杂、参数量大、图像特征信息损失过多和计算效率低。针对以上问题，基于编-解码器框架和离散小波变换，设计了一个联合多连接特征编解码与小波池化的轻量级语义分割网络MLWP-Net（Multi-Link Wavelet-Pooled Network），在编码阶段利用多连接策略并结合深度可分离卷积、空洞卷积和通道压缩设计了轻量级特征提取瓶颈结构，并设计了低频混合小波池化操作替代传统的下采样操作，有效降低编码过程造成的信息丢失；在解码阶段，设计了多分支并行空洞卷积解码器以融合多级特征并行实现图像分辨率的恢复。实验结果表明，MLWP-Net仅以0.74 MB的参数量在数据集Cityscapes和CamVid上分别达到74.1%和68.2% mIoU的分割精度，验证了该算法的有效性。
- 实时语义分割 /
- 轻量级神经网络 /
- 多连接特征融合 /
- 小波池化 /
- 多分支空洞卷积
Abstract: Semantic segmentation is currently one of the basic technologies in the field of scene understanding. Existing semantic segmentation networks usually result in complex structures, a large number of parameters, excessive loss of image feature information, and low computational efficiency. To address these problems, this work proposes a lightweight semantic segmentation network named MLWP-Net (Multi-Link Wavelet-Pooled Network) which combines features with multiple connections and wavelet pooling based on the encoder-decoder framework and discrete wavelet transform (DWT). In the encoding phase, a lightweight feature extraction bottleneck was designed by combining with the depthwise separable convolution, dilated convolution, and channel compression, using a multi-link strategy to fuse multi-level features; besides, a low-frequency-mixed wavelet pooling operation was employed to replace the traditional downsampling operation for effectively reducing the information loss during the encoding process. In the decoding stage, a multi-branch parallel dilated convolutional decoder is designed to fuse multiple features linked to the different layers in the encoder to recover the image resolution in parallel. The experimental results show that our MLWP-Net achieves 74.1% and 68.2% mIoU segmentation accuracy on the datasets of Cityscapes and Camvid with only 0.74M parameters, which demonstrates its effectiveness for semantic segmentation.
- real-time semantic segmentation /
- lightweight neural network /
- multi-link feature fusion /
- wavelet pooling /
- multi-branch dilated convolution

图 1 MLWP-Net网络的整体结构

下载: 全尺寸图片幻灯片

图 2 不同block结构的对比

下载: 全尺寸图片幻灯片

图 3 低频混合小波池化（LWP）

下载: 全尺寸图片幻灯片

图 4 多分支空洞卷积解码器（MPDCD）

下载: 全尺寸图片幻灯片

图 5 不同模型在Cityscapes数据集上的语义分割结果

下载: 全尺寸图片幻灯片

表 1 不同卷积模块在MLWP-Net上的实验结果

Bottleneck	mIoU/%	GFLOPs	Params/MB
ResNet^[21]	60.4	38.4	0.57
Non-bt-1D^[7]	71.8	35.6	0.99
本文PFF	73.5	18.1	0.74

下载: 导出CSV

表 2 LWP在Cityscapes验证集的实验结果

Network	LWP	mIoU/%	GFLOPs	Params/MB
ERFNet^[7]	-	68.0	35.4	2.06
ERFNet^[7]	√	72.8	35.9	2.04
DABNet^[10]	-	70.1	13.7	0.76
DABNet^[10]	√	70.6	14.4	0.65
ESNet^[26]	-	70.7	32.1	1.66
ESNet^[26]	√	72.0	32.7	1.63
MLWP-Net	-	72.3	17.6	0.84
MLWP-Net	√	73.5	18.1	0.74

下载: 导出CSV

表 3 MPDCD在Cityscapes验证集的实验结果

Module	MPDCD	mIoU/%	GFLOPs	Params/MB
CGNet^[9]	-	64.8	9.24	0.49
CGNet^[9]	√	70.1	15.51	0.51
FRNet^[11]	-	70.4	16.97	1.01
FRNet^[11]	√	71.1	23.27	1.03
MLWP-Net	-	72.1	13.6	0.73
MLWP-Net	√	73.5	18.1	0.74

下载: 导出CSV

表 4 不同小波基函数在CamVid验证集的实验结果

Method	正交性	对称性	紧支性	Speed(fps)	mIoU/%
Haar	有	对称	有	95	68.2
db1	有	近似对称	有	94.8	67.8
rbio1.1	无	对称	无	95.3	67.4
bior1.1	无	不对称	有	92.1	67.6

下载: 导出CSV

表 5 不同模型在Cityscapes测试集的实验结果

Method	Pretrain	Input Size	mIoU/%	Params/MB	Speed/fps
SegNet^[3]	ImageNet	360×640	56.1	29.5	38.2
RefineNet^[5]	ImageNet	512×1024	73.6	118.1	9.1
SQNet^[27]	ImageNet	512×1024	59.8	16.3	25.7
BiseNetV2^[28]	No	512×1024	73.6	6.2	51.0
ENet^[6]	No	512×1024	58.3	0.36	27.4
ERFNet^[7]	No	512×1024	68.0	2.1	41.9
LEDNet^[8]	No	512×1024	69.2	0.95	59.6
CGNet^[9]	No	512×1024	64.8	0.49	65.6
DABNet^[10]	No	512×1024	70.1	0.76	102
FRNet^[11]	No	512×1024	70.4	1.01	127
ESNet^[26]	No	512×1024	70.7	1.66	63.0
EDANet^[29]	No	512×1024	67.3	0.68	105.5
ESPNet^[30]	No	512×1024	60.3	0.36	146.0
ContextNet^[31]	No	1024×2048	66.1	0.85	57.7
Fast-SCNN^[32]	No	1024×2048	68.0	1.1	67.1
DFANet^[33]	ImageNet	1024×1024	71.3	7.8	100
LRNNet^[34]	No	512×1024	72.2	0.68	71
AGLNet^[35]	No	512×1024	71.3	1.12	52
DDPNet^[36]	No	768×1536	74.0	2.52	85.4
CSRNet-light^[38]	ResNet18	512×1024	74.0	——	56
LETNet^[39]	No	512×1024	72.8	0.95	150
MLWP-Net (ours)	No	512×1024	74.1	0.74	85.6

下载: 导出CSV

表 6 不同模型在CamVid测试集的实验结果

Method	Pretrain	Input Size	mIoU/%	Params/MB	Speed/fps
SegNet^[3]	ImageNet	360×480	55.6	29.5	49.8
ENet^[6]	No	360×480	51.3	0.36	105.7
LEDNet^[8]	No	360×480	66.6	0.95	109.6
CGNet^[9]	No	360×480	65.6	0.50	112
DABNet^[10]	No	360×480	66.4	0.76	117
EDANet^[29]	No	360×480	66.4	0.68	232.2
ESPNet^[30]	No	360×480	55.6	0.36	297.6
DFANet^[33]	No	720×960	64.7	7.8	120.0
LRNNet^[34]	No	360×480	67.6	0.67	83
DDPNet^[36]	No	360×480	67.3	1.1	—
MLWP-Net(ours)	No	360×480	68.2	0.74	95

下载: 导出CSV

表 7 不同模型在Cityscapes测试集上的预分类结果 %

Method	Roa	Sid	Bui	Wal	Fen	Pol	TLi	TSi	Veg	Ter	Sky	Ped	Rid	Car	Tru	Bus	Tra	Mot	Bic	Class	Cat
SegNet^[3]	96.4	73.2	84.0	28.4	29.0	35.7	39.8	45.1	87.0	63.8	91.8	62.8	42.8	89.3	38.1	43.1	44.1	35.8	51.9	57.0	79.1
ENet^[6]	96.3	74.2	75.0	32.2	33.2	43.4	34.1	44.0	88.6	61.4	90.6	65.5	38.4	90.6	36.9	50.5	48.1	38.8	55.4	58.3	80.4
ERFNet^[7]	97.2	80.0	89.5	41.6	45.3	56.4	60.5	64.6	91.4	68.7	94.2	76.1	56.4	92.4	45.7	60.6	27.0	48.7	61.8	66.3	85.2
LEDNet^[8]	98.1	79.5	91.6	47.7	49.9	62.8	61.3	72.8	92.6	61.2	94.9	76.2	53.7	90.9	64.4	64.0	52.7	44.4	71.6	70.6	87.1
CGNet^[9]	95.5	78.7	88.1	40.0	43.0	54.1	59.8	63.9	89.6	67.6	92.9	74.9	54.9	90.2	44.1	59.5	25.2	47.3	60.2	64.8	85.7
DABNet^[10]	97.9	82.0	90.6	45.5	50.1	59.3	63.5	67.7	91.8	70.1	92.8	78.1	57.8	93.7	52.8	63.7	56.0	51.3	66.8	70.1	87.0
ESNet^[26]	98.1	80.4	92.4	48.3	49.2	61.5	62.5	72.3	92.5	61.5	94.4	76.6	53.2	94.4	62.5	74.3	52.4	45.5	71.4	70.7	87.4
SQNet^[27]	96.9	75.4	87.9	31.6	35.7	50.9	52.0	61.7	90.9	65.8	93.0	73.8	42.6	91.5	18.8	41.2	33.3	34.0	59.9	59.8	84.3
EDANet^[29]	97.8	80.6	89.5	42.0	46.0	52.3	59.8	65.0	91.4	68.7	93.6	75.7	54.3	92.4	40.9	58.7	56.0	50.2	64.0	67.3	85.8
ESPNet^[30]	97.0	77.5	76.2	35.0	36.1	45.0	35.6	46.3	90.8	63.2	92.6	67.0	40.9	92.3	38.1	52.5	50.1	41.8	57.2	60.3	82.2
LAANet^[37]	97.9	82.9	91.0	47.5	51.5	59.3	66.0	70.3	92.3	69.9	94.7	81.8	61.4	94.2	58.6	74.5	55.1	54.3	69.4	73.6	88.4
MLWP-Net(ours)	98.1	83.4	91.7	55.4	52.5	62.1	67.1	71.8	92.7	70.0	95.0	83.1	63.3	94.7	60.2	75.7	62.5	56.9	71.3	74.1	89.0

下载: 导出CSV

[1]	PENG B. Research on operation stability evaluation of industrial automation system based on improved deep learning[J]. International Journal of Manufacturing Technology and Management, 2022, 36(2/3/4): 141. doi: 10.1504/IJMTM.2022.123660
[2]	孔令军, 王茜雯, 包云超, 等. 基于深度学习的医疗图像分割综述[J]. 无线电通信技术, 2021, 47(2): 121-130. doi: 10.3969/j.issn.1003-3114.2021.02.001 KONG L J, WANG Q W, BAO Y C, et al. A survey on medical image segmentation based on deep learning[J]. Radio Communications Technology, 2021, 47(2): 121-130. doi: 10.3969/j.issn.1003-3114.2021.02.001
[3]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(12): 2481-2495. doi: 10.1109/TPAMI.2016.2644615
[4]	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//European Conference on Computer Vision. Cham: Springer, 2018: 833-851.
[5]	LIN G S, MILAN A, SHEN C H, et al. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 1925-1934.
[6]	PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL]. [2023-05-21]. https://arxiv.org/abs/1606.02147.
[7]	ROMERA E, ALVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(1): 263-272. doi: 10.1109/TITS.2017.2750080
[8]	WANG Y, ZHOU Q, LIU J, et al. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Image Processing. New York: IEEE, 2019: 1860-1864.
[9]	WU T, TANG S, ZHANG R, et al. CGNet: A light-weight context guided network for semantic segmentation[J]. IEEE Trans Image Process, 2021, 30: 1169-1179. doi: 10.1109/TIP.2020.3042065
[10]	LI G, YUN I, KIM J, et al. DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation[EB/OL]. [2023-05-22]. https://arxiv.org/pdf/1907.11357.pdf.
[11]	LU M X, CHEN Z X, JONATHAN WU Q M, et al. FRNet: Factorized and regular blocks network for semantic segmentation in road scene[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(4): 3522-3530. doi: 10.1109/TITS.2020.3037727
[12]	CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 1251-1258.
[13]	HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. [2023-05-22]. https://arxiv.org/pdf/1704.04861.pdf.
[14]	ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 6848-6856.
[15]	马宇, 张丽果, 杜慧敏, 等. 卷积神经网络的交通标志语义分割[J]. 计算机科学与探索, 2021, 15(6): 1114-1121. MA Y, ZHANG L G, DU H M, et al. Traffic sign semantic segmentation based on convolutional neural network[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1114-1121.
[16]	SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity: The all convolutional net[EB/OL]. [2023-05-21]. https://arxiv.org/pdf/1412.6806.pdf.
[17]	JAMALI A. Comparing the performance and application of wavelet transform in digital image processing segmentation[EB/OL]. [2023-05-22]. http://dx.doi.org/10.2139/ssrn.4554509.
[18]	RAMAMONJISOA M, FIRMAN M, WATSON J, et al. Single image depth prediction with wavelet decomposition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2021: 11089-11098.
[19]	LIU P J, ZHANG H Z, ZHANG K, et al. Multi-level wavelet-CNN for image restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE, 2018: 773-782.
[20]	XUE S K, QIU W Y, LIU F, et al. Wavelet-based residual attention network for image super-resolution[J]. Neurocomputing, 2020, 382: 116-126. doi: 10.1016/j.neucom.2019.11.044
[21]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 770-778.
[22]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 2117-2125.
[23]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/TPAMI.2017.2699184
[24]	CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 3213-3223.
[25]	BROSTOW G J, FAUQUEUR J, CIPOLLA R. Semantic object classes in video: A high-definition ground truth database[J]. Pattern Recognition Letters, 2009, 30(2): 88-97. doi: 10.1016/j.patrec.2008.04.005
[26]	WANG Y, ZHOU Q, XIONG J, et al. ESNet: an efficient symmetric network for real-time semantic segmentation[M]//Pattern Recognition and Computer Vision. Cham: Springer International Publishing, 2019.
[27]	TREML M, ARJONA-MEDINA J A, UNTERTHINER T, et al. Speeding up semantic segmentation for autonomous driving[EB/OL] . [2023-05-21]. https://www.researchgate.net/publication/309935608_Speeding_up_Semantic_Segmentation_for_Autonomous_Driving.
[28]	YU C Q, GAO C X, WANG J B, et al. BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068. doi: 10.1007/s11263-021-01515-2
[29]	LO S Y, HANG H M, CHAN S W, et al. Efficient dense modules of asymmetric convolution for real-time semantic segmentation[C]//Proceedings of the Proceedings of the 1st ACM International Conference on Multimedia in Asia. New York: ACM, 2019: 1-6.
[30]	MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation[M]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018.
[31]	POUDEL R P K, BONDE U, LIWICKI S, et al. Contextnet: Exploring context and detail for semantic segmentation in real-time[EB/OL]. [2023-05-21]. https://arvix.org/abs/1805.04554.
[32]	POUDEL R P K, LIWICKI S, CIPOLLA R. Fast-scnn: Fast semantic segmentation network[EB/OL]. [2023-05-21]. https://arvix.org/abs/1902.04502.
[33]	LI H C, XIONG P F, FAN H Q, et al. DFANet: Deep feature aggregation for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 9522-9531.
[34]	JIANG W H, XIE Z Z, LI Y Y, et al. LRNNET: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation[C]//Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. New York: IEEE, 2020: 1-6.
[35]	ZHOU Q, WANG Y, FAN Y W, et al. AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network[J]. Applied Soft Computing, 2020, 96: 106682. doi: 10.1016/j.asoc.2020.106682
[36]	GE R J, HE Y T, XIA C, et al. DDPNet: A novel dual-domain parallel network for low-dose CT reconstruction[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2022: 748-757.
[37]	ZHANG X L, DU B C, WU Z Y, et al. LAANet: Lightweight attention-guided asymmetric network for real-time semantic segmentation[J]. Neural Computing and Applications, 2022, 34(5): 3573-3587. doi: 10.1007/s00521-022-06932-z
[38]	XIONG J J, PO L M, YU W Y, et al. CSRNet: Cascaded Selective Resolution Network for real-time semantic segmentation[J]. Expert Systems with Applications, 2023, 211: 118537. doi: 10.1016/j.eswa.2022.118537
[39]	XU G A, LI J C, GAO G W, et al. Lightweight real-time semantic segmentation network with efficient transformer and CNN[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(12): 15897-15906. doi: 10.1109/TITS.2023.3248089

[1]	崔少国, 张乐迁, 文浩. GCFF-Net：一种面向视网膜血管精细分割的多层级图卷积特征融合神经编解码网络 . 电子科技大学学报, doi: 10.12178/1001-0548.2023131
[2]	范兴奎, 刘广哲, 王浩文, 马鸿洋, 李伟, 王淑梅. 基于量子卷积神经网络的图像识别新模型 . 电子科技大学学报, doi: 10.12178/1001-0548.2022279
[3]	周丰丰, 牛甲昱. 细胞穿膜肽识别问题的多特征融合卷积网络预测算法 . 电子科技大学学报, doi: 10.12178/1001-0548.2021391
[4]	李昕昕, 师恩. 异构多分支网络超声图像自动诊断方法 . 电子科技大学学报, doi: 10.12178/1001-0548.2020246
[5]	杜娟, 刘志刚, 宋考平, 杨二龙. 基于卷积神经网络的抽油机故障诊断 . 电子科技大学学报, doi: 10.12178/1001-0548.2019205
[6]	吴睿智, 朱大勇, 王春雨, 秦科. 基于图卷积神经网络的位置语义推断 . 电子科技大学学报, doi: 10.12178/1001-0548.2020152
[7]	田文洪, 曾柯铭, 莫中勤, 吝博强. 基于卷积神经网络的驾驶员不安全行为识别 . 电子科技大学学报, doi: 10.3969/j.issn.1001-0548.2019.03.012
[8]	刘志刚, 许少华, 肖佃师, 杜娟. 极限学习脊波过程神经网络及应用 . 电子科技大学学报, doi: 10.3969/j.issn.1001-0548.2019.01.018
[9]	唐贤伦, 刘庆, 张娜, 周家林. 混合PSO优化卷积神经网络结构和参数 . 电子科技大学学报, doi: 10.3969/j.issn.1001-0548.2018.02.011
[10]	胡旺, 张建, 陈维锋, 胡斌, 郭红梅. 基于神经网络的烈度衰减融合模型研究 . 电子科技大学学报, doi: 10.3969/j.issn.1001-0548.2018.02.010
[11]	张良, 李玉, 刘婷婷, 郝凯锋. 融合小波和LBP-GD特征的人脸表情识别 . 电子科技大学学报, doi: 10.3969/j.issn.1001-0548.2018.05.003
[12]	郭继昌, 李翔鹏. 基于卷积神经网络和密度分布特征的人数统计方法 . 电子科技大学学报, doi: 10.3969/j.issn.1001-0548.2018.06.002
[13]	陈俊周, 汪子杰, 陈洪瀚, 左林翼. 基于级联卷积神经网络的视频动态烟雾检测 . 电子科技大学学报, doi: 10.3969/j.issn.1001-0548.2016.06.020
[14]	秦志光, 陈浩, 丁熠, 蓝天, 陈圆, 沈广宇. 基于多模态卷积神经网络的脑血管提取方法研究 . 电子科技大学学报, doi: 10.3969/j.issn.1001-0548.2016.04.010
[15]	邢玲, 马强, 朱敏. 基于神经网络的数字音频双重语义水印算法 . 电子科技大学学报, doi: 10.3969/j.issn.1001-0548.2013.02.016
[16]	韦蓉, 朱颖, 宇天航, 武穆清. 面向连接的Ad hoc网络多径路由协议 . 电子科技大学学报,
[17]	张伟, 师奕兵, 王志刚. 随钻声波测井数据压缩的小波神经网络法 . 电子科技大学学报,
[18]	汤志伟, 符萍. 基于小波神经网络的信息系统综合评价模型 . 电子科技大学学报,
[19]	陈波, 于泠. 基于小波神经网络的服务器预警系统 . 电子科技大学学报,
[20]	汪天富, 郑昌琼, 李德玉. 基于神经网络的超声医学图像自动分割 . 电子科技大学学报,

点击查看大图

图(5) / 表(7)

计量

文章访问数: 347
HTML全文浏览量: 123
PDF下载量: 13
被引次数: 0

全文HTML

语义分割技术作为计算机视觉的一部分，目的是为图像中的每个像素分配类别标签，被广泛应用于工业自动化^[1]、医疗图像^[2]等场景解析领域。尤其在面向自动驾驶的城市交通场景中，高效的语义分割模型可以对道路做出实时场景解析，为路径规划、避让行人障碍等提供有效的辅助信息。然而真实应用场景中往往要求语义分割网络同时具有较高的分割精度和较快的计算速度，这对语义分割的准确性和实时性均提出较高的要求，因此亟需研究出一种能够在分割精度和计算成本之间实现较好权衡的语义分割算法。

现有的提高语义分割准确度的策略大多是加大网络的深度，以期获得更加丰富的图像特征信息。目前分割效果较好的语义分割网络，如SegNet^[3]、DeepLabV3+^[4]、RefineNet^[5]等都有较高的准确率。但这些网络算法具有较大的模型参数量和较高的计算复杂度，进而影响分割效率。为了将语义分割技术实现落地应用并获得实时处理图像信息的效果，轻量级神经网络设计成为实时语义分割任务的一个重要研究目标。

现有的轻量级网络如ENet^[6]、ERFNet^[7]、LEDNet^[8]、CGNet^[9]、DABNet^[10]、FRNet^[11]等的参数量都已经控制在1 MB以下。其中，ENet和SegNet是两大经典的轻量化模型，通过采用非对称的编解码结构和通道裁剪策略，SegNet的参数量仅为0.36M，而ERFNet利用非瓶颈残差结构并将标准卷积替换为非对称卷积，降低参数量的同时获得很好的分割精度。而Xception^[12]使用深度可分离卷积替代标准卷积，增加网络深度的同时还减少了参数量。在Xception的基础上，MobileNet^[13]引入深度可分离卷积和残差模块来实现模型的压缩和推理的加速，减少卷积操作带来的参数量和计算量的同时保持较好的分割性能。相较之下，ShuffleNet^[14]运用通道混洗的策略，通过转置、分组卷积、通道乱序的方法来促进信息流动，精简模块的同时提高计算效率。尽管以上网络在参数量方面较小，并保证一定的分割精度，但仍然难以满足真实场景中的应用需求^[15]。

此外，为了降低特征的维度并保留有效信息，现有的大多数语义分割网络均采用下采样池化操作，如最大池化、平均池化、随机池化等。但池化操作往往会使得图像分辨率下降，导致图像特征信息丢失。尽管已有研究者对池化操作的特征信息丢失问题进行改进，如采用带步长的卷积替代池化操作或采用低通滤波去除高频特征之后再进行下采样操作^[16]，但此类操作或增加计算量，或影响网络的特征表达能力。而离散小波变换（DWT）以其强大的时频分析能力，被广泛应用于信号与图像处理领域^[17]。随着深度学习的不断发展，越来越多的研究也将其应用于卷积神经网络（CNN）的优化中。如，将DWT应用到编-解码器中，降低参数量的同时提高网络的运算速度^[18]；或将其结合残差网络，利用小波变换提高图像的恢复能力^[19]；或者，将其与注意力机制结合，加强对不同频率分量的特征注意力^[20]。然而，现有的小波变换与CNN的组合方法并未充分发挥其多通道分频的优势，仍然具有较大的改进空间。

综合以上分析，本文提出了一个联合多连接特征编解码与小波池化的轻量级语义分割网络，简称称MLWP-Net（Multi-Link Wavelet-Pooled Network），包括：轻量化的逐步特征融合模块（Progressive Feature Fusion, PFF）；基于小波变换理论的低频混合小波池化操作（Low-frequency-mixed Wavelet Pooling, LWP），用于实现高效的下采样操作；以及多分支并行空洞卷积解码器（Multi-branch Parallel Dilated Convolutional Decoder, MPDCD）。经大量实验验证，MLWP-Net具有计算复杂度低且分割精度高的优点。

3. 结束语

本文提出了一种渐进式特征融合与低频混合小波池化结合的轻量化语义分割网络MLWP-Net，解决了现有语义分割网络中存在的特征信息提取不足和网络参数量较大等问题。一方面，在编码器端主要设计了轻量化的多连接逐步特征融合PFF模块和通用型的低频混合小波池化LWP操作，应用前者实现了上下文信息的有效聚合，从而高效地提取图像特征；应用后者解决了现有网络中下采样操作导致的特征信息丢失问题，高效地完成下采样操作，并可插入其他分割网络中作下采样操作。另一方面，提出了多分支空洞卷积特征融合MPDCD解码器，有效结合多尺度上下文特征实现图像空间信息的高效恢复。

与现存流行的实时语义分割网络对比，MLWP-Net在保证高精度的前提下，大幅度减少了模型参数量，对移动终端领域有很好的应用前景，尤其适用于对准确性和时效性要求较高的自动驾驶中的道路场景分割任务中。

参考文献 (39)

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

联合多连接特征编解码与小波池化的轻量级语义分割

doi: 10.12178/1001-0548.2023124

作者简介:
易清明，博士，教授，主要从事多媒体信息处理方面的研究

通讯作者: 通信作者E-mail: luoaiwen@jnu.edu.cn

Lightweight Semantic Segmentation by Combining Multi-Link Feature Codec with Wavelet Pooling

计量

联合多连接特征编解码与小波池化的轻量级语义分割

doi: 10.12178/1001-0548.2023124

1. 暨南大学信息科学技术学院，广州 510632

2. 泰斗微电子科技有限公司，广州 510663

作者简介:
易清明，博士，教授，主要从事多媒体信息处理方面的研究

通讯作者: 通信作者E-mail: luoaiwen@jnu.edu.cn

English Abstract

Lightweight Semantic Segmentation by Combining Multi-Link Feature Codec with Wavelet Pooling

1. School of Information Science and Technology, Jinan University, Guangzhou 510632, China

2. Taidou Microelectronic Science and Technology Co., Ltd., Guangzhou 510663, China

全文HTML

1.1. 整体网络结构介绍

1.2. 逐步特征融合（PFF）

1.3. 低频混合小波池化（LWP）

1.4. 编码器构建

1.5. 多分支并行空洞卷积解码器（MPDCD）构建

2.1. 数据集与实验参数

2.1.1. 实验数据集

2.1.2. 参数设置

2.2. 消融实验

2.2.1. PFF模块消融实验

2.2.2. LWP消融实验

2.2.3. MPDCD消融实验

2.2.4. 小波基函数消融实验

2.3. MLWP-Net与现有网络的整体性能对比

目录

期刊在线

编辑办公

友情链接

留言板

联合多连接特征编解码与小波池化的轻量级语义分割

doi: 10.12178/1001-0548.2023124

作者简介: 易清明，博士，教授，主要从事多媒体信息处理方面的研究

通讯作者: 通信作者E-mail: luoaiwen@jnu.edu.cn

Lightweight Semantic Segmentation by Combining Multi-Link Feature Codec with Wavelet Pooling

计量

出版历程

联合多连接特征编解码与小波池化的轻量级语义分割

doi: 10.12178/1001-0548.2023124

1. 暨南大学信息科学技术学院，广州 510632 2. 泰斗微电子科技有限公司，广州 510663

作者简介: 易清明，博士，教授，主要从事多媒体信息处理方面的研究

通讯作者: 通信作者E-mail: luoaiwen@jnu.edu.cn

English Abstract

Lightweight Semantic Segmentation by Combining Multi-Link Feature Codec with Wavelet Pooling

1. School of Information Science and Technology, Jinan University, Guangzhou 510632, China 2. Taidou Microelectronic Science and Technology Co., Ltd., Guangzhou 510663, China

全文HTML

1.1. 整体网络结构介绍

1.2. 逐步特征融合（PFF）

1.3. 低频混合小波池化（LWP）

1.4. 编码器构建

1.5. 多分支并行空洞卷积解码器（MPDCD）构建

2.1. 数据集与实验参数

2.1.1. 实验数据集

2.1.2. 参数设置

2.2. 消融实验

2.2.1. PFF模块消融实验

2.2.2. LWP消融实验

2.2.3. MPDCD消融实验

2.2.4. 小波基函数消融实验

2.3. MLWP-Net与现有网络的整体性能对比

目录

期刊在线

编辑办公

友情链接

作者简介:
易清明，博士，教授，主要从事多媒体信息处理方面的研究

1. 暨南大学信息科学技术学院，广州 510632

2. 泰斗微电子科技有限公司，广州 510663

作者简介:
易清明，博士，教授，主要从事多媒体信息处理方面的研究

1. School of Information Science and Technology, Jinan University, Guangzhou 510632, China

2. Taidou Microelectronic Science and Technology Co., Ltd., Guangzhou 510663, China