Empowering lightweight video transformer via the kernel learning

Autor: Xiaoxi Liu, Ju Liu, Lingchen Gu
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Electronics Letters, Vol 60, Iss 9, Pp n/a-n/a (2024)
Druh dokumentu: article
ISSN: 1350-911X
0013-5194
DOI: 10.1049/ell2.13215
Popis: Abstract Video transformers achieve superior performance in video recognition. Despite the recent advances in video transformers, they still require substantial computation and memory resources. To cater for the computation efficiency, a kernel‐based video transformer is proposed, including: (1) a new formulation of the video transformer via the kernel learning is presented to better understand the individual components of it; (2) a lightweight Kernel‐based spatial–temporal multi‐head self‐attention block is explored to learn the compact joint spatial–temporal video feature; (3) an adaptive‐score position embedding method is conducted to promote the flexibility of video transformer. Experimental results on several action recognition datasets demonstrate the effectiveness of the proposed method. Only pretrained on ImageNet‐1K, the method achieves the preferable balance between computation and accuracy, while requiring 7× fewer parameters and 13× fewer floating point operations than other comparable methods.
Databáze: Directory of Open Access Journals