Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

Autor:	Rathod, Vivek, Seybold, Bryan, Vijayanarasimhan, Sudheendra, Myers, Austin, Gu, Xiuye, Birodkar, Vighnesh, Ross, David A.
Rok vydání:	2022
Předmět:	Computer Science - Computer Vision and Pattern Recognition
Druh dokumentu:	Working Paper
Popis:	Detecting actions in untrimmed videos should not be limited to a small, closed set of classes. We present a simple, yet effective strategy for open-vocabulary temporal action detection utilizing pretrained image-text co-embeddings. Despite being trained on static images rather than videos, we show that image-text co-embeddings enable openvocabulary performance competitive with fully-supervised models. We show that the performance can be further improved by ensembling the image-text features with features encoding local motion, like optical flow based features, or other modalities, like audio. In addition, we propose a more reasonable open-vocabulary evaluation setting for the ActivityNet data set, where the category splits are based on similarity rather than random assignment.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2212.10596 Zobrazit plný text záznamu View this record from Arxiv