多种注意力机制的AViT-UNet高效医学影像分割方法

AViT-UNet: An efficient medical image segmentation method based on multiple attention mechanisms

  • 摘要: 针对现有的医学影像语义分割方法复杂度高、参数量大、精度低及无法在低配置及医院边缘部署设备等实际问题,提出了一种基于多种注意力机制的vision transformer U-Net型轻量级医学影像语义分割方法AViT-UNet。首先设计了轻量化的卷积模块(LDB)并应用于编码−解码层的卷积模块,降低了模型的计算复杂度。其次,引入了自注意力机制模块EMHA,在深层网络与瓶颈层进行应用,加强了分割精度。最后,针对跳跃连接与特征输入部分,网络使用通道注意力、空间注意力等机制,加强了残差连接与卷积深度,使分割结果更加精细。该方法有效弥补了Transformer的高计算量与卷积神经网络在捕获全局特征方面的不足,在轻量化网络的同时提高了语义分割的精度,使语义分割网络能够部署在配置有限的医疗设备和移动平台上。在Synapse、GlaS和MoNuSeg这3个公开医学影像语义分割基准数据集上进行多维度评测指标验证,结果证明了该方法具有一定的先进性和可行性。具体实现代码已上传至https://github.com/shepherdxu/AViT-UNet

     

    Abstract: Existing medical image semantic segmentation methods suffer from high computational complexity, large parameter counts, suboptimal accuracy, and inability to be deployed on low-resource and clinical edge devices. To address these issues, AViT-UNet, a lightweight vision transformer U-Net model incorporating multiple attention mechanisms is proposed to reduce model size and latency while maintaining competitive segmentation performance. Firstly, a lightweight convolutional module, lightweight dilated bottleneck (LDB), is designed in this model and applied to the convolutional module of the encoding-decoding layer, which significantly reduces the computational complexity of the model. Secondly, a self-attention mechanism module, efficient multi-head attention (EMHA), is invoked and applied in the deep network and bottleneck layer to enhance the segmentation accuracy. Finally, to enhance the fidelity of skip connections and feature fusion, the network integrates channel and spatial attention mechanisms to bolster residual pathways and deepen convolutional representations, yielding more precise segmentation outputs. This strategy effectively compensates the high computational demands of transformer-based models and the limited global receptive field of conventional convolutional neural networks. As a result, the proposed lightweight architecture achieves superior semantic segmentation accuracy while remaining suitable for deployment on resource-constrained medical devices and mobile platforms. The proposed method is validated on three publicly available medical image semantic segmentation benchmark datasets, Synapse, GlaS, and MoNuSeg, with multi-dimensional evaluation metrics. Experimental results fully prove that this method has a certain degree of advancement and feasibility. The specific implementation code of the method has been uploaded to https://github.com/shepherdxu/AViT-UNet.

     

/

返回文章
返回