Popis: |
With the vigorous development of digital creativity, the image data generated by it has exploded. To effectively manage massive image data, multi-level and multi-classification management of images has become very necessary. However, the existing hierarchical classification models of deep learning images are all based on convolutional neural networks, which have limitations in capturing the underlying global features. Different from this, Transformer, as a new neural network, captures the global context information through the attention mechanism, so it performs excellently in various visual recognition tasks. However, the existing work based on Transformer does not use the hierarchical structure information in the model, making it challenging to apply the model to multi-level and multi-classification tasks of images. Therefore, this paper proposes a new image multi-level and multi-classification model, which uses multi-scale CNN to effectively capture feature information at different scales and combines it with the Transformer’s ability to extract global features. At the same time, the model makes full use of the hierarchical structure information in Transformer to better understand the complex relationship of images. We have done a lot of experiments on three data sets, CIFAR-10, CIFAR-100, and CUB-200-2011, and compared the performance with the existing multi-level and multi-classification model of images. The results show that our model has higher classification accuracy and better robustness. |