Entity linking across vision and language

Autor: Marie-Francine Moens, Tinne Tuytelaars, Aparna Nurani Venkitasubramanian
Rok vydání: 2017
Předmět:
Zdroj: Multimedia Tools and Applications. 76:22599-22622
ISSN: 1573-7721
1380-7501
Popis: We propose a novel weakly supervised framework that jointly tackles entity analysis tasks in vision and language. Given a video with subtitles, we jointly address the questions: a) What do the textual entity mentions refer to? and b) What/ who are in the video key frames? We use a Markov Random Field (MRF) to encode the dependencies within and across the two modalities. This MRF model incorporates beliefs using independent methods for the textual and visual entities. These beliefs are propagated across the modalities to jointly derive the entity labels. We apply the framework to a challenging dataset of wildlife documentaries with subtitles and show that this integrated modelling yields significantly better performance over text-based and vision-based approaches. We show that textual mentions that cannot be resolved using text-only methods are resolved correctly using our method. The approaches described here bring us closer to automated multimedia indexing. Nurani Venkitasubramanian A., Tuytelaars T., Moens M.-F., ''Entity linking across vision and language'', Multimedia tools and applications, vol. 76, no. 21, pp. 22599-22622, 24 pp., November 2017. ispartof: Multimedia Tools and Applications vol:76 issue:21 pages:22599-22622 status: published
Databáze: OpenAIRE