Lightweight Action Recognition in Compressed Videos
Autor: | Xiaoli Xu, Yao Lu, Ji-Rong Wen, Mingyu Ding, Yulei Niu, Zhiwu Lu, Yuqi Huo, Tao Xiang |
---|---|
Rok vydání: | 2020 |
Předmět: | |
Zdroj: | Computer Vision – ECCV 2020 Workshops ISBN: 9783030660956 ECCV Workshops (2) |
DOI: | 10.1007/978-3-030-66096-3_24 |
Popis: | Most existing action recognition models are large convolutional neural networks that work only with raw RGB frames as input. However, practical applications require lightweight models that directly process compressed videos. In this work, for the first time, such a model is developed, which is lightweight enough to run in real-time on embedded AI devices without sacrifices in recognition accuracy. A new Aligned Temporal Trilinear Pooling (ATTP) module is formulated to fuse three modalities in a compressed video. To remedy the weaker motion vectors (compared to optical flow computed from raw RGB streams) for representing dynamic content, we introduce a temporal fusion method to explicitly induce the temporal context, as well as knowledge distillation from a model trained with optical flows via feature alignment. Compared to existing compressed video action recognition models, it is much more compact and faster thanks to adopting a lightweight CNN backbone. |
Databáze: | OpenAIRE |
Externí odkaz: |