Deeply Encoding Stable Patterns From Contaminated Data for Scenery Image Recognition
Autor: | Xiaoming Ju, Xuelong Li, Luming Zhang, Yongheng Shang |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer science
business.industry ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Pattern recognition Semantics Computer Science Applications Human-Computer Interaction Generative model Categorization Kernel (image processing) Control and Systems Engineering Embedding Artificial intelligence Electrical and Electronic Engineering business Software Subspace topology Information Systems |
Zdroj: | IEEE Transactions on Cybernetics. 51:5671-5680 |
ISSN: | 2168-2275 2168-2267 |
DOI: | 10.1109/tcyb.2019.2951798 |
Popis: | Effectively recognizing different sceneries with complex backgrounds and varied lighting conditions plays an important role in modern AI systems. Competitive performance has recently been achieved by the deep scene categorization models. However, these models implicitly hypothesize that the image-level labels are 100% correct, which is too restrictive. Practically, the image-level labels for massive-scale scenery sets are usually calculated by external predictors such as ImageNet-CN. These labels can easily become contaminated because no predictors are completely accurate. This article proposes a new deep architecture that calculates scene categories by hierarchically deriving stable templates, which are discovered using a generative model. Specifically, we first construct a semantic space by incorporating image-level labels using subspace embedding. Afterward, it is noticeable that in the semantic space, the superpixel distributions from identically labeled images remain unchanged, regardless of the image-level label noises. On the basis of this observation, a probabilistic generative model learns the stable templates for each scene category. To deeply represent each scenery category, a novel aggregation network is developed to statistically concatenate the CNN features learned from scene annotations predicted by HSA. Finally, the learned deep representations are integrated into an image kernel, which is subsequently incorporated into a multiclass SVM for distinguishing scene categories. Thorough experiments have shown the performance of our method. As a byproduct, an empirical study of 33 SIFT-flow categories shows that the learned stable templates remain almost unchanged under a nearly 36% image label contamination rate. |
Databáze: | OpenAIRE |
Externí odkaz: |