Spatially and Temporally Structured Global to Local Aggregation of Dynamic Depth Information for Action Recognition
Autor: | Yonghong Hou, Shuang Wang, Zhimin Gao, Pichao Wang, Wanqing Li |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
General Computer Science
Computer science depth Feature extraction 02 engineering and technology 010501 environmental sciences 01 natural sciences Convolutional neural network 3D action recognition Depth map 0202 electrical engineering electronic engineering information engineering Structured motion images skeleton General Materials Science Representation (mathematics) 0105 earth and related environmental sciences Sequence business.industry Rank (computer programming) General Engineering Pattern recognition Construct (python library) 020201 artificial intelligence & image processing Artificial intelligence lcsh:Electrical engineering. Electronics. Nuclear engineering business lcsh:TK1-9971 ConvNets |
Zdroj: | IEEE Access, Vol 6, Pp 2206-2219 (2018) |
ISSN: | 2169-3536 |
Popis: | This paper presents an effective yet simple video representation for RGB-D-based action recognition. It proposes to represent a depth map sequence into three pairs of structured dynamic images (DIs) at body, part, and joint levels, respectively, through hierarchical bidirectional rank pooling. Different from previous works that applied one convolutional neural network (ConvNet) for each part/joint separately, one pair of structured DIs is constructed from depth maps at each granularity level and serves as the input of a ConvNet. The structured DI not only preserves the spatial-temporal information but also enhances the structure information across both body parts/joints and different temporal scales. In additionally, it requires low computational cost and memory to construct. This new representation, referred to as Spatially and Temporally Structured Dynamic Depth Images, aggregates from global to fine-grained levels motion and structure information in a depth sequence, and enables us to fine-tune the existing ConvNet models trained on image data for classification of depth sequences, without a need for training the models afresh. The proposed representation is evaluated on six benchmark data sets, namely, MSRAction3D, G3D, MSRDailyActivity3D, SYSU 3D HOI, UTD-MHAD, and M2I data sets and achieves the state-of-the-art results on all six data sets. |
Databáze: | OpenAIRE |
Externí odkaz: |