CA-CMT: Coordinate Attention for Optimizing CMT Networks

Autor:	Ji-Hyeon Bang, Sung-Wook Park, Jun-Yeong Kim, Jun Park, Jun-Ho Huh, Se-Hoon Jung, Chun-Bo Sim
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Computer vision deep learning (DL) convolutional neural network (CNN) vision transformer (ViT) Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 11, Pp 76691-76702 (2023)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2023.3297206
Popis:	Vision Transformer (ViT) has been proposed as a new image recognition method in the field of computer vision. ViT applies a Transformer structure with excellent performance in the field of natural language processing to recognize images. Unlike existing Convolutional Neural Network (CNN) models, ViT can achieve State-Of-The-Art (SOTA) image recognition without inputting Inductive Biases into the model, demonstrating that the Transformer is a useful structure in the field of computer vision. However, ViT requires large datasets such as ImageNet-21K and Joint Foto Tree (JFT) for learning. In addition, it takes a lot of time to train. Moreover, there is a problem that location information is lost by inputting images in patch units. To improve such issues, many models are being proposed. In this paper, a new model is proposed by restructuring the Convolutional neural networks Meet vision Transformers (CMT) model by applying Coordinate Attention Block, a CNN model, to improve problems of the Vision Transformer family of models. The proposed model combines Transformer, which has shown excellent performance in Long Range, and CNN, which has shown excellent performance in Local Feature, to achieve higher performance than existing models. We also compared performance of the proposed model with those of existing models with relatively small datasets such as Canadian Institute For Advanced Research-10 (CIFAR-10), Self-Taught Learning-10 (STL-10), and Tiny-ImageNet to facilitate researchers’ access to the evaluation. Despite being restructured from the smallest CMT-Tiny model, the proposed model showed better accuracy than CMT-Tiny, CMT-XS, CMT-S, and CMT-B models with CIFAR-10, STL-10, and Tiny-ImageNet datasets. The proposed model showed an accuracy of 90.21% with the CIFAR-10 dataset, higher than existing CMT models except for the CMT-S model with an accuracy of 90.6%. It had the lowest loss value of 0.3967. The proposed model is expected to be utilized as a backbone in Object Detection and Segmentation fields in the future.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/2165447041b148b8b574e6d058d91a3a Zobrazit plný text záznamu View record in DOAJ