Context-Aware RCNN: A Baseline for Action Detection in Videos

Autor:	Zhanghui Kuang, Jianchao Wu, Gangshan Wu, Wayne Zhang, Limin Wang
Rok vydání:	2020
Předmět:	Computer science business.industry Feature extraction ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Context (language use) 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Pipeline (software) Action (philosophy) Minimum bounding box 0202 electrical engineering electronic engineering information engineering Feature (machine learning) 020201 artificial intelligence & image processing Artificial intelligence business computer 0105 earth and related environmental sciences
Zdroj:	Computer Vision – ECCV 2020 ISBN: 9783030585945 ECCV (25)
DOI:	10.1007/978-3-030-58595-2_27
Popis:	Video action detection approaches usually conduct actor-centric action recognition over RoI-pooled features following the standard pipeline of Faster-RCNN. In this work, we first empirically find the recognition accuracy is highly correlated with the bounding box size of an actor, and thus higher resolution of actors contributes to better performance. However, video models require dense sampling in time to achieve accurate recognition. To fit in GPU memory, the frames to backbone network must be kept low-resolution, resulting in a coarse feature map in RoI-Pooling layer. Thus, we revisit RCNN for actor-centric action recognition via cropping and resizing image patches around actors before feature extraction with I3D deep network. Moreover, we found that expanding actor bounding boxes slightly and fusing the context features can further boost the performance. Consequently, we develop a surprisingly effective baseline (Context-Aware RCNN) and it achieves new state-of-the-art results on two challenging action detection benchmarks of AVA and JHMDB. Our observations challenge the conventional wisdom of RoI-Pooling based pipeline and encourage researchers rethink the importance of resolution in actor-centric action recognition. Our approach can serve as a strong baseline for video action detection and is expected to inspire new ideas for this filed. The code is available at https://github.com/MCG-NJU/CRCNN-Action.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::4b637db080168e47be019cdc70cfbc35 https://doi.org/10.1007/978-3-030-58595-2_27 Zobrazit plný text záznamu