Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Zeng, Xingchen"'
Emerging multimodal large language models (MLLMs) exhibit great potential for chart question answering (CQA). Recent efforts primarily focus on scaling up training datasets (i.e., charts, data tables, and question-answer (QA) pairs) through data coll
Externí odkaz:
http://arxiv.org/abs/2407.20174
Multi-modal embeddings form the foundation for vision-language models, such as CLIP embeddings, the most widely used text-image embeddings. However, these embeddings are vulnerable to subtle misalignment of cross-modal features, resulting in decrease
Externí odkaz:
http://arxiv.org/abs/2407.12315
Fine-tuning facilitates the adaptation of text-to-image generative models to novel concepts (e.g., styles and portraits), empowering users to forge creatively customized content. Recent efforts on fine-tuning focus on reducing training data and light
Externí odkaz:
http://arxiv.org/abs/2401.15559
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Zeng, Xingchen, Zhou, Haowen, Li, Zhicong, Zhang, Chenqi, Lin, Juncong, Xia, Jiazhi, Yang, Yanyi, Kui, Xiaoyan
Publikováno v:
Journal of Visualization; Jun2023, Vol. 26 Issue 3, p631-648, 18p
Publikováno v:
IEEE transactions on visualization and computer graphics [IEEE Trans Vis Comput Graph] 2024 Sep 10; Vol. PP. Date of Electronic Publication: 2024 Sep 10.
Publikováno v:
IEEE transactions on visualization and computer graphics [IEEE Trans Vis Comput Graph] 2024 Sep 09; Vol. PP. Date of Electronic Publication: 2024 Sep 09.