Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection

Autor: Zhongzheng Ren, Jan Kautz, Yong Jae Lee, Alexander G. Schwing, Ming-Yu Liu, Xiaodong Yang, Zhiding Yu
Rok vydání: 2020
Předmět:
FOS: Computer and information sciences
Computer Science - Machine Learning
Computer science
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
Context (language use)
02 engineering and technology
010501 environmental sciences
Machine learning
computer.software_genre
01 natural sciences
Machine Learning (cs.LG)
Discriminative model
FOS: Electrical engineering
electronic engineering
information engineering

0202 electrical engineering
electronic engineering
information engineering

0105 earth and related environmental sciences
Ground truth
business.industry
Image and Video Processing (eess.IV)
Supervised learning
Electrical Engineering and Systems Science - Image and Video Processing
Object (computer science)
Object detection
Benchmark (computing)
020201 artificial intelligence & image processing
Artificial intelligence
business
computer
Zdroj: CVPR
DOI: 10.1109/cvpr42600.2020.01061
Popis: Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need for strong supervision during training. However, major challenges remain: (1) differentiation of object instances can be ambiguous; (2) detectors tend to focus on discriminative parts rather than entire objects; (3) without ground truth, object proposals have to be redundant for high recalls, causing significant memory consumption. Addressing these challenges is difficult, as it often requires to eliminate uncertainties and trivial solutions. To target these issues we develop an instance-aware and context-focused unified framework. It employs an instance-aware self-training algorithm and a learnable Concrete DropBlock while devising a memory-efficient sequential batch back-propagation. Our proposed method achieves state-of-the-art results on COCO ($12.1\% ~AP$, $24.8\% ~AP_{50}$), VOC 2007 ($54.9\% ~AP$), and VOC 2012 ($52.1\% ~AP$), improving baselines by great margins. In addition, the proposed method is the first to benchmark ResNet based models and weakly supervised video object detection. Code, models, and more details will be made available at: https://github.com/NVlabs/wetectron.
CVPR 2020
Databáze: OpenAIRE