Saliency based Subject Selection for Diverse Image Captioning

Autor:	Duc Minh Vo, Quoc-An Luong, Akihiro Sugimoto
Rok vydání:	2021
Předmět:	Closed captioning Structure (mathematical logic) Root (linguistics) business.industry Machine vision Computer science ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Pattern recognition Visualization Selection (linguistics) Scene graph Artificial intelligence business Representation (mathematics)
Zdroj:	MVA
DOI:	10.23919/mva51890.2021.9511360
Popis:	Image captioning has drawn more and more attention because of its practical usefulness in many multimedia applications. Multiple criteria such as accuracy, detail or diversity exist to evaluate the quality of generated captions. Among them, diversity is the most difficult because for a given image, its multiple captions should be generated while retaining their accuracy. We approach to diverse image captioning by explicitly selecting objects in an image one by one as a subject in generating captions. Our method has three main steps: (1) After generating scene graph of a given image, we first give selection priority to the nodes (namely, subjects) in the scene graph based on the size and visual saliency of objects. (2) With a selected subject, we prune a portion of the scene graph structure that is irrelevant to the subject to have subject-oriented scene graph for accurate captioning. (3) We convert the subject-oriented scene graph into its more sentence-friendly abstract meaning representation (AMR) to generate the caption whose the subject is the selected root. In this way, we can generate captions whose subjects are different from each other, achieving diversity. Our proposed method achieves comparable results with other methods in both diversity and accuracy.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::c25bc116adbd9c37020a18c77c9dbe45 https://doi.org/10.23919/mva51890.2021.9511360 Zobrazit plný text záznamu