Disambiguating Visual Verbs
Autor: | Frank Keller, Mirella Lapata, Spandana Gella |
---|---|
Rok vydání: | 2017 |
Předmět: |
Computer science
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL Object (grammar) Verb 02 engineering and technology 010501 environmental sciences computer.software_genre Semantics 01 natural sciences Image (mathematics) Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Image retrieval 0105 earth and related environmental sciences Word-sense disambiguation business.industry Applied Mathematics Range (mathematics) Task (computing) ComputingMethodologies_PATTERNRECOGNITION Computational Theory and Mathematics ComputingMethodologies_DOCUMENTANDTEXTPROCESSING 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Artificial intelligence business computer Software Natural language processing |
Zdroj: | IEEE Transactions on Pattern Analysis and Machine Intelligence Gella, S, Keller, F & Lapata, M 2019, ' Disambiguating Visual Verbs ', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 311-322 . https://doi.org/10.1109/TPAMI.2017.2786699 |
ISSN: | 0162-8828 |
DOI: | 10.1109/TPAMI.2017.2786699 |
Popis: | In this article, we introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i.e., the one that describes the action depicted in the image. Just as textual word sense disambiguation is useful for a wide range of NLP tasks, visual sense disambiguation can be useful for multimodal tasks such as image retrieval, image description, and text illustration. We introduce a new dataset, which we call VerSe (short for Ver b Se nse) that augments existing multimodal datasets (COCO and TUHOI) with verb and sense labels. We explore supervised and unsupervised models for the sense disambiguation task using textual, visual, and multimodal embeddings. We also consider a scenario in which we must detect the verb depicted in an image prior to predicting its sense (i.e., there is no verbal information associated with the image). We find that textual embeddings perform well when gold-standard annotations (object labels and image descriptions) are available, while multimodal embeddings perform well on unannotated images. VerSe is publicly available at https://github.com/spandanagella/verse . |
Databáze: | OpenAIRE |
Externí odkaz: |