Few-Shot Common-Object Reasoning Using Common-Centric Localization Network
Autor: | Linchao Zhu, Hehe Fan, Yi Yang, Mingliang Xu, Yawei Luo |
---|---|
Rok vydání: | 2021 |
Předmět: |
Relation (database)
business.industry Computer science Feature extraction ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Representation (systemics) Pattern recognition 02 engineering and technology Object (computer science) Computer Graphics and Computer-Aided Design Object detection Minimum bounding box Feature (computer vision) 0202 electrical engineering electronic engineering information engineering Graph (abstract data type) Artificial Intelligence & Image Processing 020201 artificial intelligence & image processing Artificial intelligence business Software 0801 Artificial Intelligence and Image Processing 0906 Electrical and Electronic Engineering 1702 Cognitive Sciences |
Zdroj: | IEEE Transactions on Image Processing. 30:4253-4262 |
ISSN: | 1941-0042 1057-7149 |
DOI: | 10.1109/tip.2021.3070733 |
Popis: | In the few-shot common-localization task, given few support images without bounding box annotations at each episode, the goal is to localize the common object in the query image of unseen categories. The few-shot common-localization task involves common object reasoning from the given images, predicting the spatial locations of the object with different shapes, sizes, and orientations. In this work, we propose a common-centric localization (CCL) network for few-shot common-localization. The motivation of our common-centric localization network is to learn the common object features by dynamic feature relation reasoning via a graph convolutional network with conditional feature aggregation. First, we propose a local common object region generation pipeline to reduce background noises due to feature misalignment. Each support image predicts more accurate object spatial locations by replacing the query with the images in the support set. Second, we introduce a graph convolutional network with dynamic feature transformation to enforce the common object reasoning. To enhance the discriminability during feature matching and enable a better generalization in unseen scenarios, we leverage a conditional feature encoding function to alter visual features according to the input query adaptively. Third, we introduce a common-centric relation structure to model the correlation between the common features and the query image feature. The generated common features guide the query image feature towards a more common object-related representation. We evaluate our common-centric localization network on four datasets, i.e., CL-VOC-07, CL-VOC-12, CL-COCO, CL-VID. We obtain significant improvements compared to state-of-the-art. Our quantitative results confirm the effectiveness of our network. |
Databáze: | OpenAIRE |
Externí odkaz: |