Abstract:
Semantic segmentation is currently one of the basic technologies in the field of scene understanding. Existing semantic segmentation networks usually result in complex structures, a large number of parameters, excessive loss of image feature information, and low computational efficiency. To address these problems, this work proposes a lightweight semantic segmentation network named MLWP-Net (Multi-Link Wavelet-Pooled Network) which combines features with multiple connections and wavelet pooling based on the encoder-decoder framework and Discrete Wavelet Transform (DWT). In the encoding phase, a lightweight feature extraction bottleneck is designed by combining with the depthwise separable convolution, dilated convolution, and channel compression, using a multi-link strategy to fuse multi-level features; besides, a low-frequency-mixed wavelet pooling operation is employed to replace the traditional downsampling operation for effectively reducing the information loss during the encoding process. In the decoding stage, a multi-branch parallel dilated convolutional decoder is designed to fuse multiple features linked to the different layers in the encoder to recover the image resolution in parallel. The experimental results show that our MLWP-Net achieves 74.1% and 68.2% mIoU segmentation accuracy on the datasets of Cityscapes and Camvid with only 0.74M parameters, which demonstrates its effectiveness for semantic segmentation.