Výsledky vyhledávání - "Singh, Aashu"

Report

CompCap: Improving Multimodal Large Language Models with Composite Captions

Autor: Chen, Xiaohui, Shukla, Satya Narayan, Azab, Mahmoud, Singh, Aashu, Wang, Qifan, Yang, David, Peng, ShengYun, Yu, Hanchao, Yan, Shen, Zhang, Xuewen, He, Baosheng

How well can Multimodal Large Language Models (MLLMs) understand composite images? Composite images (CIs) are synthetic visuals created by merging multiple visual elements, such as charts, posters, or screenshots, rather than being captured directly

Externí odkaz: http://arxiv.org/abs/2412.05243

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání