Temporal Segment Networks for Action Recognition in Videos

Autor: Yu Qiao, Zhe Wang, Yuanjun Xiong, Luc Van Gool, Xiaoou Tang, Dahua Lin, Limin Wang
Rok vydání: 2019
Předmět:
FOS: Computer and information sciences
Scheme (programming language)
Technology
REPRESENTATION
Computer science
Computer Vision and Pattern Recognition (cs.CV)
Pooling
Computer Science - Computer Vision and Pattern Recognition
VECTOR
02 engineering and technology
Computer Science
Artificial Intelligence

Action recognition
temporal modeling
Engineering
temporal segment networks
Artificial Intelligence
Histogram
0202 electrical engineering
electronic engineering
information engineering

Representation (mathematics)
computer.programming_language
good practices
Science & Technology
business.industry
Applied Mathematics
Sampling (statistics)
Engineering
Electrical & Electronic

Pattern recognition
Visualization
Computational Theory and Mathematics
Computer Science
RGB color model
020201 artificial intelligence & image processing
Computer Vision and Pattern Recognition
Artificial intelligence
business
computer
Software
ConvNets
Zdroj: IEEE Transactions on Pattern Analysis and Machine Intelligence. 41:2740-2755
ISSN: 1939-3539
0162-8828
DOI: 10.1109/tpami.2018.2868668
Popis: Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident. We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structures with a new segment-based sampling and aggregation module. This unique design enables our TSN to efficiently learn action models by using the whole action videos. The learned models could be easily adapted for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the instantiation of TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on four challenging action recognition benchmarks: HMDB51 (71.0%), UCF101 (94.9%), THUMOS14 (80.1%), and ActivityNet v1.2 (89.6%). Using the proposed RGB difference for motion models, our method can still achieve competitive accuracy on UCF101 (91.0%) while running at 340 FPS. Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.
Comment: 14 pages. An extension of submission at https://arxiv.org/abs/1608.00859
Databáze: OpenAIRE