Abstrakt: |
The fine-grained image recognition (FGIR) task aims to classify and distinguish subtle differences between subcategories with visually similar appearances, such as bird species and the makes or models of vehicles. However, subtle interclass differences and significant intraclass variances lead to poor model recognition performance. To address these challenges, we developed a mixed-mask teacher–student cooperative training strategy. A mixed masked image is generated and embedded into a knowledge distillation network by replacing one image's visible marker with another's masked marker. Collaborative reinforcement between teachers and students is used to improve the recognition performance of the network. We chose the classic transformer architecture as a baseline to better explore the contextual relationships between features. Additionally, we suggest a dual dynamic selection plug-in for choosing features with discriminative capabilities in the spatial and channel dimensions and filter out irrelevant interference information to efficiently handle background and noise features in fine-grained images. The proposed feature suppression module is used to enhance the differences between different features, thereby motivating the network to mine more discriminative features. We validated our method using two datasets: CUB-200-2011 and Stanford Cars. The experimental results show that the proposed MT-DSNet can significantly improve the feature representation for FGIR tasks. Moreover, by applying it to different fine-grained networks, the FGIR accuracy can be improved without changing the original network structure. We hope that this work provides a promising approach for improving the feature representation of networks in the future. • Building a novel, highly reliable deep-learning framework for fine-grained images. • Developed a mixed-mask teacher–student collaborative training strategy. • Propose a new dual dynamic selection plug-in module. • Two benchmarks show that our approach has better feature representation capabilities. [ABSTRACT FROM AUTHOR] |