Edge Computing-Based Video Action Recognition Method and Its Application in Online Physical Education Teaching

Autor:	Jinzhu Han, Jinjin Zhao, Yan Yue, Xinrui Che
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Video action recognition online PE teaching lightweight network spatial-temporal pruning cross-temporal token interaction Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 148666-148676 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3475372
Popis:	Due to the impact of COVID-19, online physical education (PE) teaching has garnered increasing attention. Given the characteristics of online PE teaching, introducing artificial intelligence technology to automatically detect or recognize students’ actions or behaviors has gradually emerged as a trend. However, traditional cloud computing-based intelligent online PE teaching systems often face various challenging issues, such as computational complexity and latency. Edge computing can address these problems. However, edge devices typically have limited computing power, while existing deep action recognition models often contain a large number of parameters and require significant computational resources, making them difficult to deploy on edge devices. To address the above issues, this paper proposes a lightweight video recognition method, named the lightweight video ViT (LWV-ViT) network. More specifically, based on the standard ViT model, the video-based ViT (VBViT) network is first introduced by developing a cross-temporal token interaction module to effectively process temporal information in videos. Furthermore, the LWV-ViT network is proposed by implementing a spatial-temporal pruning scheme to reduce the number of parameters. Finally, the proposed LWV-ViT network is deployed in an edge computing-based online PE teaching system, where it is installed on each edge device. This setup enables fast data processing, reduces transmission latency, and protects sensitive data. Experimental results show that the proposed LWV-ViT network achieves the best recognition rates for both behavior detection (96.5%, 95.73%) and action recognition (97.9%, 88.3%, 79.9%) tasks, and has the fewest trainable parameters (2.7 M), which means it performs well in edge computing-based online PE teaching systems.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/249c14ab695147df98e2e1cbd1fc22d6 Zobrazit plný text záznamu View record in DOAJ