Dual-Dependency Attention Transformer for Fine-Grained Visual Classification

Autor:	Shiyan Cui, Bin Hui
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	deep learning fine-grained visual classification vision transformer Chemical technology TP1-1185
Zdroj:	Sensors, Vol 24, Iss 7, p 2337 (2024)
Druh dokumentu:	article
ISSN:	1424-8220
DOI:	10.3390/s24072337
Popis:	Visual transformers (ViTs) are widely used in various visual tasks, such as fine-grained visual classification (FGVC). However, the self-attention mechanism, which is the core module of visual transformers, leads to quadratic computational and memory complexity. The sparse-attention and local-attention approaches currently used by most researchers are not suitable for FGVC tasks. These tasks require dense feature extraction and global dependency modeling. To address this challenge, we propose a dual-dependency attention transformer model. It decouples global token interactions into two paths. The first is a position-dependency attention pathway based on the intersection of two types of grouped attention. The second is a semantic dependency attention pathway based on dynamic central aggregation. This approach enhances the high-quality semantic modeling of discriminative cues while reducing the computational cost to linear computational complexity. In addition, we develop discriminative enhancement strategies. These strategies increase the sensitivity of high-confidence discriminative cue tracking with a knowledge-based representation approach. Experiments on three datasets, NABIRDS, CUB, and DOGS, show that the method is suitable for fine-grained image classification. It finds a balance between computational cost and performance.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/79b7dcdb88324a7492239f249e4773b9 Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.