The Optimal Choice of the Encoder–Decoder Model Components for Image Captioning

Autor:	Mateusz Bartosiewicz, Marcin Iwanowski
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	image captioning recurrent neural networks feature extraction networks Information technology T58.5-58.64
Zdroj:	Information, Vol 15, Iss 8, p 504 (2024)
Druh dokumentu:	article
ISSN:	2078-2489
DOI:	10.3390/info15080504
Popis:	Image captioning aims at generating meaningful verbal descriptions of a digital image. This domain is rapidly growing due to the enormous increase in available computational resources. The most advanced methods are, however, resource-demanding. In our paper, we return to the encoder–decoder deep-learning model and investigate how replacing its components with newer equivalents improves overall effectiveness. The primary motivation of our study is to obtain the highest possible level of improvement of classic methods, which are applicable in less computational environments where most advanced models are too heavy to be efficiently applied. We investigate image feature extractors, recurrent neural networks, word embedding models, and word generation layers and discuss how each component influences the captioning model’s overall performance. Our experiments are performed on the MS COCO 2014 dataset. As a result of our research, replacing components improves the quality of generating image captions. The results will help design efficient models with optimal combinations of their components.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/0254818f63214ad1980c9a32fd3cb29e Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.