An attention model based on spatial transformers for scene recognition
Autor: | Songyang Lao, Wei Wang, Liang Wang, Li Liu, Shuxuan Guo |
---|---|
Rok vydání: | 2016 |
Předmět: |
business.industry
Computer science Feature extraction ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Pattern recognition 02 engineering and technology Attention model 010501 environmental sciences 01 natural sciences Convolutional neural network Visualization Discriminative model 020204 information systems 0202 electrical engineering electronic engineering information engineering Computer vision Artificial intelligence business 0105 earth and related environmental sciences Transformer (machine learning model) |
Zdroj: | ICPR |
DOI: | 10.1109/icpr.2016.7900219 |
Popis: | Scene recognition is an important and challenging task in computer vision. We propose an end-to-end pipeline by combing convolutional neural networks (CNNs) with explicit attention model to determine several meaningful regions of original images for scene recognition. In the proposed pipeline, the spatial transformer network is leveraged as the attention module, which can automatically learn the scales and movements of centers of attention windows. As for feature extraction, the basic CNN architecture is utilized. Furthermore, the stronger descriptors of scenes are constructed by feature fusion. The highlight of our proposed network is that it is capable to localize discriminative regions from an image in a data-driven manner without any additional supervision. We conduct experiments on a subset of the Places205 database to evaluate the performance of the proposed basic network and the involved parameters. Our model achieves state-of-the-art top-1 accuracy 82.10% on the evaluation dataset comparing with fine-tuned PlacesCNN (80.98%). We find that our model is able to learn informative attention regions for discriminating scene categories. |
Databáze: | OpenAIRE |
Externí odkaz: |