Výsledky vyhledávání - "Image Captioning"

Akademický článek

Thangka image captioning model with Salient Attention and Local Interaction Aggregator.

Autor: Hu, Wenjin^1,2 (AUTHOR) wenjin_zhm@126.com, Zhang, Fujun¹ (AUTHOR), Zhao, Yinqiu¹ (AUTHOR)

Publikováno v: Heritage Science. 11/19/2024, Vol. 12 Issue 1, p1-21. 21p.

Zobrazit plný text záznamu

Plný text ve formátu HTML

Report

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis

Autor: Bucciarelli, Davide, Moratelli, Nicholas, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita

The task of image captioning demands an algorithm to generate natural language descriptions of visual inputs. Recent advancements have seen a convergence between image captioning research and the development of Large Language Models (LLMs) and Multim

Externí odkaz: http://arxiv.org/abs/2412.03665

Zobrazit plný text záznamu

Report

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Autor: Hua, Hang, Liu, Qing, Zhang, Lingzhi, Shi, Jing, Zhang, Zhifei, Wang, Yilin, Zhang, Jianming, Luo, Jiebo

The advent of large Vision-Language Models (VLMs) has significantly advanced multimodal tasks, enabling more sophisticated and accurate reasoning across various applications, including image and video captioning, visual question answering, and cross-

Externí odkaz: http://arxiv.org/abs/2411.15411

Zobrazit plný text záznamu

Report

Uterine Ultrasound Image Captioning Using Deep Learning Techniques

Autor: Boulesnane, Abdennour, Mokhtari, Boutheina, Segueni, Oumnia Rana, Segueni, Slimane

Medical imaging has significantly revolutionized medical diagnostics and treatment planning, progressing from early X-ray usage to sophisticated methods like MRIs, CT scans, and ultrasounds. This paper investigates the use of deep learning for medica

Externí odkaz: http://arxiv.org/abs/2411.14039

Zobrazit plný text záznamu

Report

The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning

Autor: Bai, Longju, Borah, Angana, Ignat, Oana, Mihalcea, Rada

Large Multimodal Models (LMMs) exhibit impressive performance across various multimodal tasks. However, their effectiveness in cross-cultural contexts remains limited due to the predominantly Western-centric nature of most data and models. Conversely

Externí odkaz: http://arxiv.org/abs/2411.11758

Zobrazit plný text záznamu

Report

Image Generation from Image Captioning -- Invertible Approach

Autor: Menon, Nandakishore S, Kamanchi, Chandramouli, Diddigi, Raghuram Bharadwaj

Our work aims to build a model that performs dual tasks of image captioning and image generation while being trained on only one task. The central idea is to train an invertible model that learns a one-to-one mapping between the image and text embedd

Externí odkaz: http://arxiv.org/abs/2410.20171

Zobrazit plný text záznamu

Report

CAPEEN: Image Captioning with Early Exits and Knowledge Distillation

Autor: Bajpai, Divya Jyoti, Hanawal, Manjesh Kumar

Deep neural networks (DNNs) have made significant progress in recognizing visual elements and generating descriptive text in image-captioning tasks. However, their improved performance comes from increased computational burden and inference latency.

Externí odkaz: http://arxiv.org/abs/2410.04433

Zobrazit plný text záznamu

Akademický článek

Image Captioning Based on Semantic Scenes.

Autor: Zhao, Fengzhi^1,2 (AUTHOR) yuzz@jlu.edu.cn, Yu, Zhezhou^1,2,3 (AUTHOR) taowang19@mails.jlu.edu.cn, Wang, Tao^1,2 (AUTHOR) lvyi18@mails.jlu.edu.cn, Lv, Yi^1,2 (AUTHOR)

Publikováno v: Entropy. Oct2024, Vol. 26 Issue 10, p876. 20p.

Zobrazit plný text záznamu

Plný text ve formátu HTML

Report

DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding

Autor: Wu, Hao, Zhong, Zhihang, Sun, Xiao

Image captioning models often suffer from performance degradation when applied to novel datasets, as they are typically trained on domain-specific data. To enhance generalization in out-of-domain scenarios, retrieval-augmented approaches have garnere

Externí odkaz: http://arxiv.org/abs/2412.01115

Zobrazit plný text záznamu

Report

OFCap:Object-aware Fusion for Image Captioning

Autor: Huang, Feiyang

Image captioning is a technique that translates image content into natural language descriptions. Many application scenarios, such as intelligent search engines and assistive tools for visually impaired individuals, involve images containing people.

Externí odkaz: http://arxiv.org/abs/2412.00095

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání