Image caption generation using transformer learning methods: a case study on instagram image.

Autor: Dittakan, Kwankamon, Prompitak, Kamontorn, Thungklang, Phutphisit, Wongwattanakit, Chatchawan
Předmět:
Zdroj: Multimedia Tools & Applications; May2024, Vol. 83 Issue 15, p46397-46417, 21p
Abstrakt: Nowadays, images are being used more extensively for communication purposes. A single image can convey a variety of stories, depending on the perspective and thoughts of everyone who views it. To facilitate comprehension, inclusion image captions is highly beneficial, especially for individuals with visual impairments who can read Braille or rely on audio descriptions. The purpose of this research is to create an automatic captioning system that is easy to understand and quick to generate. This system can be applied to other related systems. In this research, the transformer learning process is applied to image captioning instead of the convolutional neural networks (CNN) and recurrent neural networks (RNN) process which has limitations in processing long-sequence data and managing data complexity. The transformer learning process can handle these limitations well and more efficiently. Additionally, the image captioning system was trained on a dataset of 5,000 images from Instagram that were tagged with the hashtag "Phuket" (#Phuket). The researchers also wrote the captions themselves to use as a dataset for testing the image captioning system. The experiments showed that the transformer learning process can generate natural captions that are close to human language. The generated captions will also be evaluated using the Bilingual Evaluation Understudy (BLEU) score and Metric for Evaluation of Translation with Explicit Ordering (METEOR) score, a metric for measuring the similarity between machine-translated text and human-written text. This will allow us to compare the resemblance between the researcher-written captions and the transformer-generated captions. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index