Zobrazeno 1 - 10
of 10
pro vyhledávání: '"Grasch, Peter"'
Autor:
Lai, Zhengfeng, Saveris, Vasileios, Chen, Chen, Chen, Hong-You, Zhang, Haotian, Zhang, Bowen, Tebar, Juan Lao, Hu, Wenze, Gan, Zhe, Grasch, Peter, Cao, Meng, Yang, Yinfei
Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. For example, while synthetic captions often provide superior quality and image-text alignment, it is not clear wh
Externí odkaz:
http://arxiv.org/abs/2410.02740
Autor:
Zhang, Haotian, Gao, Mingfei, Gan, Zhe, Dufter, Philipp, Wenzel, Nina, Huang, Forrest, Shah, Dhruti, Du, Xianzhi, Zhang, Bowen, Li, Yanghao, Dodge, Sam, You, Keen, Yang, Zhen, Timofeev, Aleksei, Xu, Mingze, Chen, Hong-You, Fauconnier, Jean-Philippe, Lai, Zhengfeng, You, Haoxuan, Wang, Zirui, Dehghan, Afshin, Grasch, Peter, Yang, Yinfei
We present MM1.5, a new family of multimodal large language models (MLLMs) designed to enhance capabilities in text-rich image understanding, visual referring and grounding, and multi-image reasoning. Building upon the MM1 architecture, MM1.5 adopts
Externí odkaz:
http://arxiv.org/abs/2409.20566
Autor:
Amirloo, Elmira, Fauconnier, Jean-Philippe, Roesmann, Christoph, Kerl, Christian, Boney, Rinu, Qian, Yusu, Wang, Zirui, Dehghan, Afshin, Yang, Yinfei, Gan, Zhe, Grasch, Peter
Preference alignment has become a crucial component in enhancing the performance of Large Language Models (LLMs), yet its impact in Multimodal Large Language Models (MLLMs) remains comparatively underexplored. Similar to language models, MLLMs for im
Externí odkaz:
http://arxiv.org/abs/2407.02477
We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challe
Externí odkaz:
http://arxiv.org/abs/2407.01509
Autor:
McKinzie, Brandon, Gan, Zhe, Fauconnier, Jean-Philippe, Dodge, Sam, Zhang, Bowen, Dufter, Philipp, Shah, Dhruti, Du, Xianzhi, Peng, Futang, Weers, Floris, Belyi, Anton, Zhang, Haotian, Singh, Karanjeet, Kang, Doug, Jain, Ankur, Hè, Hongyu, Schwarzer, Max, Gunter, Tom, Kong, Xiang, Zhang, Aonan, Wang, Jianyu, Wang, Chong, Du, Nan, Lei, Tao, Wiseman, Sam, Yin, Guoli, Lee, Mark, Wang, Zirui, Pang, Ruoming, Grasch, Peter, Toshev, Alexander, Yang, Yinfei
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the v
Externí odkaz:
http://arxiv.org/abs/2403.09611
In this paper, we study the "stability" of machine learning (ML) models within the context of larger, complex NLP systems with continuous training data updates. For this study, we propose a methodology for the assessment of model stability (which we
Externí odkaz:
http://arxiv.org/abs/2201.05692
Autor:
Muralidharan, Deepak, Moniz, Joel Ruben Antony, Gao, Sida, Yang, Xiao, Kao, Justine, Pulman, Stephen, Kothari, Atish, Shen, Ray, Pan, Yinying, Kaul, Vivek, Ibrahim, Mubarak Seyed, Xiang, Gang, Dun, Nan, Zhou, Yidan, O, Andy, Zhang, Yuan, Chitkara, Pooja, Wang, Xuan, Patel, Alkesh, Tayal, Kushal, Zheng, Roger, Grasch, Peter, Williams, Jason D., Li, Lin
Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries. In this paper, we propose a novel architecture that
Externí odkaz:
http://arxiv.org/abs/2005.14408
Autor:
Grasch, Peter, Felfernig, Alexander
Publikováno v:
I-com; Apr2015, Vol. 14 Issue 1, p41-52, 12p
Publikováno v:
Proceedings of the 7th ACM Conference Recommender Systems; 10/12/2013, p157-164, 8p