MILA: Multi-Task Learning from Videos via Efficient Inter-Frame Attention

Autor:	Donghyun Kim, Bryan A. Plummer, Stan Sclaroff, Ning Xu, Gerard Medioni, Chuhang Zou, Jayan Eledath, Tian Lan
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Computer science business.industry Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Inter frame Multi-task learning Artificial intelligence business
Zdroj:	2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
DOI:	10.1109/iccvw54120.2021.00251
Popis:	Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos via efficient inter-frame local attention (MILA). Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a ``slow-fast'' architecture, where the slower network runs on sparsely sampled keyframes and the light-weight shallow network runs on non-keyframes at a high frame rate. We also propose an effective adversarial learning strategy to encourage the slow and fast network to learn similar features. Our approach ensures low-latency multi-task learning while maintaining high quality predictions. Experiments show competitive accuracy compared to state-of-the-art on two multi-task learning benchmarks while reducing the number of floating point operations (FLOPs) by up to 70\%. In addition, our attention based feature propagation method (ILA) outperforms prior work in terms of task accuracy while also reducing up to 90\% of FLOPs. Accepted in ICCV 2021 MTL Workshop
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::996b8591dcbc1bb33e6e5ae31af8c3d3 https://doi.org/10.1109/iccvw54120.2021.00251 Zobrazit plný text záznamu