CGMAformer: CNN and gated multi axial-sparse transformer feature fusion network for image deraining.

Autor: Qiu, Yongsheng, Lu, Yuanyao, Wang, Yuantao
Zdroj: Multimedia Systems; Dec2024, Vol. 30 Issue 6, p1-27, 27p
Abstrakt: In the realm of image deraining, traditional CNN-based deep learning deraining systems exhibit efficient expression of local features and strong generalization capabilities. However, their limited local receptive fields and independence from input content hinder their ability to model global features, rendering them less effective in mitigating complex and dynamic long rain streak scenarios. On the other hand, deraining systems based on the Transformer architecture possess robust global feature aggregation capabilities. Yet, their computational complexity increases quadratically with the expansion of the image spatial scale, making them less suitable for high-quality image deraining tasks. However, for ill-posed problems like image deraining, the precise representation of both local and global features has become increasingly pivotal to addressing the multifaceted challenges of rain streak removal. Consequently, we introduce an innovative solution, the CNN and Gated Multi Axial-Sparse Transformer Feature Fusion Network, referred to as CGMAformer. This approach optimizes both architectural paradigms jointly, effectively harnessing their respective strengths for image deraining. Specifically, in the local feature extraction phase based on CNN, we employ the Degradation-aware Mixture of Experts Feature Compensator (DMEFC) for adaptive representation of local spatial rain streak features. In the global feature extraction phase based on the Transformer, we introduce a dual-branch adaptive Gated Multi-Axis Sparse Transformer (GAST) attention mechanism to complement global background spatial features in rainy images. This approach ensures the preservation of global feature integrity while effectively reducing model complexity. Ultimately, through a feature fusion network, we fully exploit the local characteristics of CNN and the self-attention-based global aggregation capabilities of the Transformer for efficient image deraining. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index