Výsledky vyhledávání - "Grasch, Peter"

Report

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Autor: Lai, Zhengfeng, Saveris, Vasileios, Chen, Chen, Chen, Hong-You, Zhang, Haotian, Zhang, Bowen, Tebar, Juan Lao, Hu, Wenze, Gan, Zhe, Grasch, Peter, Cao, Meng, Yang, Yinfei

Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. For example, while synthetic captions often provide superior quality and image-text alignment, it is not clear wh

Externí odkaz: http://arxiv.org/abs/2410.02740

Zobrazit plný text záznamu

Report

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

We present MM1.5, a new family of multimodal large language models (MLLMs) designed to enhance capabilities in text-rich image understanding, visual referring and grounding, and multi-image reasoning. Building upon the MM1 architecture, MM1.5 adopts

Externí odkaz: http://arxiv.org/abs/2409.20566

Zobrazit plný text záznamu

Report

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Autor: Amirloo, Elmira, Fauconnier, Jean-Philippe, Roesmann, Christoph, Kerl, Christian, Boney, Rinu, Qian, Yusu, Wang, Zirui, Dehghan, Afshin, Yang, Yinfei, Gan, Zhe, Grasch, Peter

Preference alignment has become a crucial component in enhancing the performance of Large Language Models (LLMs), yet its impact in Multimodal Large Language Models (MLLMs) remains comparatively underexplored. Similar to language models, MLLMs for im

Externí odkaz: http://arxiv.org/abs/2407.02477

Zobrazit plný text záznamu

Report

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Autor: Qian, Yusu, Ye, Hanrong, Fauconnier, Jean-Philippe, Grasch, Peter, Yang, Yinfei, Gan, Zhe

We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challe

Externí odkaz: http://arxiv.org/abs/2407.01509

Zobrazit plný text záznamu

Report

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the v

Externí odkaz: http://arxiv.org/abs/2403.09611

Zobrazit plný text záznamu

Report

Model Stability with Continuous Data Updates

Autor: Liu, Huiting, S., Avinesh P. V., Patwardhan, Siddharth, Grasch, Peter, Agarwal, Sachin

In this paper, we study the "stability" of machine learning (ML) models within the context of larger, complex NLP systems with continuous training data updates. For this study, we propose a methodology for the assessment of model stability (which we

Externí odkaz: http://arxiv.org/abs/2201.05692

Zobrazit plný text záznamu

Report

Noise Robust Named Entity Understanding for Voice Assistants

Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries. In this paper, we propose a novel architecture that

Externí odkaz: http://arxiv.org/abs/2005.14408

Zobrazit plný text záznamu