A Coarse-to-Fine Framework for Resource Efficient Video Recognition

Autor:	Yingbin Zheng, Caiming Xiong, Zuxuan Wu, Yu-Gang Jiang, Larry S. Davis, Hengduo Li
Rok vydání:	2021
Předmět:	Online and offline Basis (linear algebra) business.industry Computer science Computation Machine learning computer.software_genre Resource (project management) Discriminative model Artificial Intelligence Pattern recognition (psychology) Computer Vision and Pattern Recognition Artificial intelligence Video recognition business Scale (map) computer Software
Zdroj:	International Journal of Computer Vision. 129:2965-2977
ISSN:	1573-1405 0920-5691
DOI:	10.1007/s11263-021-01508-1
Popis:	Deep neural networks have demonstrated remarkable recognition results on video classification, however great improvements in accuracies come at the expense of large amounts of computational resources. In this paper, we introduce LiteEval for resource efficient video recognition. LiteEval is a coarse-to-fine framework that dynamically allocates computation on a per-video basis, and can be deployed in both online and offline settings. Operating by default on low-cost features that are computed with images at a coarse scale, LiteEval adaptively determines on-the-fly when to read in more discriminative yet computationally expensive features. This is achieved by the interactions of a coarse RNN and a fine RNN, together with a conditional gating module that automatically learns when to use more computation conditioned on incoming frames. We conduct extensive experiments on three large-scale video benchmarks, FCVID, ActivityNet and Kinetics, and demonstrate, among other things, that LiteEval offers impressive recognition performance while using significantly less computation for both online and offline settings.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::5ab46ca2df45ea51b69b5299cdeff2d4 https://doi.org/10.1007/s11263-021-01508-1 Zobrazit plný text záznamu Plný text ve formátu PDF