A Joint Framework for Athlete Tracking and Action Recognition in Sports Videos
Autor: | Jie Qin, Longteng Kong, Yunhong Wang, Di Huang |
---|---|
Rok vydání: | 2020 |
Předmět: |
business.industry
Computer science Feature extraction Frame (networking) ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 02 engineering and technology Tracking (particle physics) Discriminative model Video tracking 0202 electrical engineering electronic engineering information engineering Media Technology Benchmark (computing) Feature (machine learning) 020201 artificial intelligence & image processing Computer vision Artificial intelligence Electrical and Electronic Engineering business |
Zdroj: | IEEE Transactions on Circuits and Systems for Video Technology. 30:532-548 |
ISSN: | 1558-2205 1051-8215 |
DOI: | 10.1109/tcsvt.2019.2893318 |
Popis: | Sports video analysis has received increasing attention in recent years. Athlete tracking and action recognition are its two major issues that are highly related to each other; however, they are individually considered and processed in the existing studies. In this paper, we propose a joint framework for athlete tracking and action recognition in sports videos. In athlete tracking, we propose a scaling and occlusion robust tracker, named scaling and occlusion robust compressive tracking (CT), to localize the position of specific athlete in each frame. It follows the approach of CT but extends it in two aspects, i.e., scale refinement as well as occlusion recovery. For the former, an objectness method, edge box, is adopted to generate proposals, which replace the fixed sampling boxes in CT and better fit the scales of the candidate objects. For the latter, a candidate obstruction-based solution is presented, which brings in additional trackers to detect possible obstructions and to relocate the target as occlusion ends. Regarding action recognition, we propose a long-term recurrent region-guided convolutional network, which recognizes pre-defined actions by modeling discriminative temporal cues of the tracking results. We employ SPP-net to extract the robust feature of the tracked region of each frame. The features of all the frames are then fed into a stack of recurrent sequence models to capture the long-term region-level information. We extensively evaluate the proposed approach on a newly collected sports video benchmark and on the off-the-shelf UIUC2 dataset, and the experimental results clearly show its effectiveness. |
Databáze: | OpenAIRE |
Externí odkaz: |