Comprehensive Overview of Attention-Driven Models for Image Captioning.

Autor: Jaiswal, Sushma, Pallthadka, Harikumar, Chinchewadi, Rajesh P., Jaiswal, Tarun
Předmět:
Zdroj: IUP Journal of Computer Sciences; Apr2024, Vol. 18 Issue 2, p7-19, 13p
Abstrakt: The paper makes an in-depth examination of the main ideas, advancements, and applications of attention-driven models for image captioning. It clarifies the basic ideas behind attention mechanisms and how they apply to image captioning jobs while exploring a number of attention-based models in depth, classifying them according to their architectural layouts and attention mechanisms. The paper also examines self-attention, single- and multi-modal attention, and other technologies that have led to amazing progress in image captioning and investigates how attention plays a crucial role in improving the caliber and context of generated image captions. Drawing attention to enhancements in model performance, including improved caption accuracy, relevance to the content of images and versatility in a range of visual scenarios, it explores the merging of different modalities and the integration of attention mechanisms with pretrained models to generate more illuminating and coherent captions. Last but not least, it tackles the difficulties and unresolved problems in attentiondriven image captioning, including managing long-range dependencies, addressing uncommon or unique concepts and enhancing the comprehensibility of attention weights. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index