Zobrazeno 1 - 10
of 16 832
pro vyhledávání: '"Image Captioning"'
Publikováno v:
Heritage Science. 11/19/2024, Vol. 12 Issue 1, p1-21. 21p.
Autor:
Bucciarelli, Davide, Moratelli, Nicholas, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita
The task of image captioning demands an algorithm to generate natural language descriptions of visual inputs. Recent advancements have seen a convergence between image captioning research and the development of Large Language Models (LLMs) and Multim
Externí odkaz:
http://arxiv.org/abs/2412.03665
Autor:
Hua, Hang, Liu, Qing, Zhang, Lingzhi, Shi, Jing, Zhang, Zhifei, Wang, Yilin, Zhang, Jianming, Luo, Jiebo
The advent of large Vision-Language Models (VLMs) has significantly advanced multimodal tasks, enabling more sophisticated and accurate reasoning across various applications, including image and video captioning, visual question answering, and cross-
Externí odkaz:
http://arxiv.org/abs/2411.15411
Medical imaging has significantly revolutionized medical diagnostics and treatment planning, progressing from early X-ray usage to sophisticated methods like MRIs, CT scans, and ultrasounds. This paper investigates the use of deep learning for medica
Externí odkaz:
http://arxiv.org/abs/2411.14039
Large Multimodal Models (LMMs) exhibit impressive performance across various multimodal tasks. However, their effectiveness in cross-cultural contexts remains limited due to the predominantly Western-centric nature of most data and models. Conversely
Externí odkaz:
http://arxiv.org/abs/2411.11758
Our work aims to build a model that performs dual tasks of image captioning and image generation while being trained on only one task. The central idea is to train an invertible model that learns a one-to-one mapping between the image and text embedd
Externí odkaz:
http://arxiv.org/abs/2410.20171
Deep neural networks (DNNs) have made significant progress in recognizing visual elements and generating descriptive text in image-captioning tasks. However, their improved performance comes from increased computational burden and inference latency.
Externí odkaz:
http://arxiv.org/abs/2410.04433
Autor:
Zhao, Fengzhi1,2 (AUTHOR) yuzz@jlu.edu.cn, Yu, Zhezhou1,2,3 (AUTHOR) taowang19@mails.jlu.edu.cn, Wang, Tao1,2 (AUTHOR) lvyi18@mails.jlu.edu.cn, Lv, Yi1,2 (AUTHOR)
Publikováno v:
Entropy. Oct2024, Vol. 26 Issue 10, p876. 20p.
Image captioning models often suffer from performance degradation when applied to novel datasets, as they are typically trained on domain-specific data. To enhance generalization in out-of-domain scenarios, retrieval-augmented approaches have garnere
Externí odkaz:
http://arxiv.org/abs/2412.01115
Autor:
Huang, Feiyang
Image captioning is a technique that translates image content into natural language descriptions. Many application scenarios, such as intelligent search engines and assistive tools for visually impaired individuals, involve images containing people.
Externí odkaz:
http://arxiv.org/abs/2412.00095