Popis: |
Semantic segmentation of cloud and shadow is an important task in remote sensing and atmospheric science. However, the complexity of cloud/shadow shapes, and noise disturbances (such as snow and ice, buildings, complex backgrounds, and atmospheric optics) make this task challenging. The traditional deep network has good details and generalization due to its local feature extraction ability and spatial invariance, but it is relatively weak in dealing with global context information, which leads to misjudgment and missed judgment in complex scenes. The transformer can effectively capture long-distance dependencies through the self-attention mechanism, but it may have challenges in extracting local image features and maintaining spatial consistency, resulting in loss of detail information and insufficient generalization. This article proposes a hybrid branch semantic segmentation network composed of convolutional network and transformer in parallel. A series of modules are designed to solve the problems of lack of multiscale feature extraction and insufficient fusion in some of the convolution–transformer hybrid networks. In particular, the network utilizes the rich information in auxiliary bands, such as near-infrared to improve the segmentation performance, so that the network can process a wider range of data and improve generalization. Experimental results on CloudSEN-12, 38-Cloud, and SPRCS-Val show that our network outperforms existing methods. After introducing the band fusion branch (HyCloudX), the network improves the segmentation performance and generalization, especially in the case of complex noise interference. |