A closer look at referring expressions for video object segmentation
Autor: | Miriam Bellver, Carles Ventura, Carina Silberer, Ioannis Kazakos, Jordi Torres, Xavier Giro-i-Nieto |
---|---|
Přispěvatelé: | Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
Jazyk: | angličtina |
Rok vydání: | 2022 |
Předmět: |
Informàtica::Intel·ligència artificial::Aprenentatge automàtic [Àrees temàtiques de la UPC]
Computer Networks and Communications Deep learning Pattern recognition systems Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo [Àrees temàtiques de la UPC] Neural networks (Computer science) Hardware and Architecture Referring expressions Video object segmentation Media Technology Xarxes neuronals (Informàtica) Reconeixement de formes (Informàtica) Vision and language Software Aprenentatge profund |
Zdroj: | UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) |
Popis: | The task of Language-guided Video Object Segmentation (LVOS) aims at generating binary masks for an object referred by a linguistic expression. When this expression unambiguously describes an object in the scene, it is named referring expression (RE). Our work argues that existing benchmarks used for LVOS are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the referring expressions in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, where the non-trivial REs are further annotated with seven RE semantic categories. We leverage these data to analyze the performance of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for LVOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions. Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was partially supported by the projects PID2019-107255GB-C22 and PID2020-117142GB-I00 funded by MCIN/ AEI /10.13039/501100011033 Spanish Ministry of Science, and the grant 2017-SGR-1414 of the Government of Catalonia. This work was also partially supported by the project RTI2018-095232-B-C22 funded by the Spanish Ministry of Science, Innovation and Universities. |
Databáze: | OpenAIRE |
Externí odkaz: |