Abstrakt: |
Many advanced models have been proposed for automatic surface defect inspection. Although CNN-based methods have achieved superior performance among these models, it is limited to extracting global semantic details due to the locality of the convolution operation. In addition, global semantic details can achieve high success for detecting surface defects. Recently, inspired by the success of Transformer, which has powerful abilities to model global semantic details with global self-attention mechanisms, some researchers have started to apply Transformer-based methods in many computer-vision challenges. However, as many researchers notice, transformers lose spatial details while extracting semantic features. To alleviate these problems, in this paper, a transformer-based Hybrid Attention Gate (HAG) model is proposed to extract both global semantic features and spatial features. The HAG model consists of Transformer (Trans), channel Squeeze-spatial Excitation (sSE), and merge process. The Trans model extracts global semantic features and the sSE extracts spatial features. The merge process which consists of different versions such as concat, add, max, and mul allows these two different models to be combined effectively. Finally, four versions based on HAG-Feature Fusion Network (HAG-FFN) were developed using the proposed HAG model for the detection of surface defects. The four different datasets were used to test the performance of the proposed HAG-FFN versions. In the experimental studies, the proposed model produced 83.83%, 79.34%, 76.53%, and 81.78% mIoU scores for MT, MVTec-Texture, DAGM, and AITEX datasets. These results show that the proposed HAGmax-FFN model provided better performance than the state-of-the-art models. [ABSTRACT FROM AUTHOR] |