Revisiting Hard Example for Action Recognition
Autor: | Li Shiren, Hu Jianguo, Wang Jinpeng, Zhihao Yuan |
---|---|
Rok vydání: | 2021 |
Předmět: |
Similarity (geometry)
business.industry Computer science Pattern recognition 02 engineering and technology Field (computer science) Task (project management) Kernel (linear algebra) Discriminative model 0202 electrical engineering electronic engineering information engineering Media Technology Feature (machine learning) Overhead (computing) 020201 artificial intelligence & image processing Artificial intelligence Electrical and Electronic Engineering business |
Zdroj: | IEEE Transactions on Circuits and Systems for Video Technology. 31:546-556 |
ISSN: | 1558-2205 1051-8215 |
DOI: | 10.1109/tcsvt.2020.2978855 |
Popis: | Video-based action recognition, which needs to handle temporal motion and spatial cues simultaneously, remains a challenging task. In this paper, our motivation is to address this issue by fully utilizing temporal information. Specially, a novel light-weight Voting-based Temporal Correlation (VTC) module is proposed to enhance temporal information. Multiple branches with different temporal sampling intervals are included in this module and they are regarded as voters. The final classification result is “voted” by these branches together. VTC module integrates sparse temporal sampling strategy into feature sequences, so it mitigates the effect of redundant information and focuses more on temporal modeling. Additionally, we propose a simple and intuitive Similarity Loss (SL) to guide the training procedure of the VTC module and the backbone network. When we introduce confusion in the predicted vector intentionally, SL eases intra-class variation by discovering class-specific common motion patterns rather than sample-specific discriminative information. SL neither needs excessive parameter tuning during training nor adds significant computation overhead during test time. By combining VTC module and SL with complementary advances in the field, we clearly outperform state-of-the-art results and achieve 83.0, 98.4, 49.6 and 77.8 accuracy on HMDB51, UCF101, something-something-v1, and Kinetics respectively. |
Databáze: | OpenAIRE |
Externí odkaz: |