Symmetrical Enhanced Fusion Network for Skeleton-Based Action Recognition
Autor: | Min Jiang, Haoyang Deng, Jun Kong |
---|---|
Rok vydání: | 2021 |
Předmět: |
Fusion
Computer science business.industry Feature extraction Pattern recognition Skeleton (category theory) Field (computer science) Data modeling Media Technology Feature (machine learning) Graph (abstract data type) Artificial intelligence Electrical and Electronic Engineering Focus (optics) business |
Zdroj: | IEEE Transactions on Circuits and Systems for Video Technology. 31:4394-4408 |
ISSN: | 1558-2205 1051-8215 |
DOI: | 10.1109/tcsvt.2021.3050807 |
Popis: | A novel method for skeleton-based action recognition by fusing multi-level spatial features and multi-level temporal features is proposed in this article. Recently, Graph Convolutional Network (GCN) for skeleton-based action recognition has attracted the eyes of many researchers and has a great performance in the field of action recognition. But most of them focus on changing architecture of single-stream network and only use simple methods like average fusion to fuse different forms of skeleton data. In this article, we shift the focus to the problem that insufficient interactions between the different forms of features for that networks are unable to fully capture efficient information from skeleton data. To tackle this problem, we propose a multi-stream network called Symmetrical Enhanced Fusion Network (SEFN). The network is composed of a spatial stream, a temporal stream and a fusion stream. The spatial stream extracts spatial features from skeleton data by GCN. The temporal stream is able to extract temporal features from skeleton data with the help of the embedded Motion Sequence Calculation Algorithm. The fusion stream provides an early fusion method and extra fusion information for the whole network. It gathers multi-level features from two feature extractions and fuses them with the Multi-perspective Attention Fusion Module (MPAFM) we propose. The MPAFM enables different forms of data to enhance each other and can strengthen feature extractions. In the final, we generalize the skeleton data from joint data to bone data and evaluate our network in three large-scale benchmarks: NTU-RGBD, NTU-RGBD 120 and Kinetics-Skeleton. Experiment results demonstrate that our method achieves competitive performance. |
Databáze: | OpenAIRE |
Externí odkaz: |