Abstrakt: |
Scene semantic segmentation plays an important role in computer vision. For real-time image segmentation, the waterfall atrous spatial pooling network (WASPnet) and the residual-two-network-segmentation (Res2Net-Seg) are two widely used lightweight networks. However, the WASPnet often loses the local features and the Res2Net-Seg tends to fuse trivially local features in the process of feature extraction. To solve these problems, this paper incorporates the advantage of Res2Net-Seg into WASPnet and extends the WASPnet with respect to two aspects. Firstly, a buffer ladder, which is based on the atrous convolution structure and the spatial pyramid pool architecture, is exploited to improve the deep feature extraction by capturing the multi-scale context. Secondly, the proposed architecture introduces a channel attention mechanism into the decoder. Thereby, the channel attention mechanism exploits the score maps output of the proposed structure. Compared to the WASPnet, the proposed network increases the MIoU on the Pascal visual object class (VOC) 2012 and Cityscapes dataset by 2.76% and 3.19%, respectively. In fact, the proposed buffer ladder improves not only the lightweight networks, but also the DeepLabv3+, which performs the best to date and has the similar module with WASPnet. The buffer ladder structure improves the MIoU of DeepLabv3+ on the Pascal VOC2012 and Cityscapes dataset by 1.48% and 2.11%, respectively. Finally, this paper proves the real-time performance with a GTX 2080Ti graphics processing unit and the results show that the proposed networks are capable of fulfilling real-time segmentation tasks. [ABSTRACT FROM AUTHOR] |