Zobrazeno 1 - 10
of 1 712
pro vyhledávání: '"Yue, Xiang"'
Autor:
Onohara, Shota, Miyai, Atsuyuki, Imajuku, Yuki, Egashira, Kazuki, Baek, Jeonghun, Yue, Xiang, Neubig, Graham, Aizawa, Kiyoharu
Accelerating research on Large Multimodal Models (LMMs) in non-English languages is crucial for enhancing user experiences across broader populations. In this paper, we introduce JMMMU (Japanese MMMU), the first large-scale Japanese benchmark designe
Externí odkaz:
http://arxiv.org/abs/2410.17250
The electrocardiogram (ECG) is an essential non-invasive diagnostic tool for assessing cardiac conditions. Existing automatic interpretation methods suffer from limited generalizability, focusing on a narrow range of cardiac conditions, and typically
Externí odkaz:
http://arxiv.org/abs/2410.19008
Autor:
Yue, Xiang, Song, Yueqi, Asai, Akari, Kim, Seungone, Nyandwi, Jean de Dieu, Khanuja, Simran, Kantharuban, Anjali, Sutawika, Lintang, Ramamoorthy, Sathyanarayanan, Neubig, Graham
Despite recent advances in multimodal large language models (MLLMs), their development has predominantly focused on English- and western-centric datasets and tasks, leaving most of the world's languages and diverse cultural contexts underrepresented.
Externí odkaz:
http://arxiv.org/abs/2410.16153
Autor:
Liu, Junpeng, Ou, Tianyue, Song, Yifan, Qu, Yuxiao, Lam, Wai, Xiong, Chenyan, Chen, Wenhu, Neubig, Graham, Yue, Xiang
Text-rich visual understanding-the ability to process environments where dense textual content is integrated with visuals-is crucial for multimodal large language models (MLLMs) to interact effectively with structured environments. To enhance this ca
Externí odkaz:
http://arxiv.org/abs/2410.13824
Autor:
Ni, Jinjie, Song, Yifan, Ghosal, Deepanway, Li, Bo, Zhang, David Junhao, Yue, Xiang, Xue, Fuzhao, Zheng, Zian, Zhang, Kaichen, Shah, Mahir, Jain, Kabir, You, Yang, Shieh, Michael
Perceiving and generating diverse modalities are crucial for AI models to effectively learn from and engage with real-world signals, necessitating reliable evaluations for their development. We identify two major issues in current evaluations: (1) in
Externí odkaz:
http://arxiv.org/abs/2410.13754
Autor:
Chen, Jiacheng, Liang, Tianhao, Siu, Sherman, Wang, Zhengqing, Wang, Kai, Wang, Yubo, Ni, Yuansheng, Zhu, Wang, Jiang, Ziyan, Lyu, Bohan, Jiang, Dongfu, He, Xuan, Liu, Yuan, Hu, Hexiang, Yue, Xiang, Chen, Wenhu
We present MEGA-Bench, an evaluation suite that scales multimodal evaluation to over 500 real-world tasks, to address the highly heterogeneous daily use cases of end users. Our objective is to optimize for a set of high-quality data samples that cove
Externí odkaz:
http://arxiv.org/abs/2410.10563
Autor:
Ma, Kaijing, Du, Xinrun, Wang, Yunran, Zhang, Haoran, Wen, Zhoufutu, Qu, Xingwei, Yang, Jian, Liu, Jiaheng, Liu, Minghao, Yue, Xiang, Huang, Wenhao, Zhang, Ge
In this paper, we introduce Knowledge-Orthogonal Reasoning (KOR), which minimizes the impact of domain-specific knowledge for a more accurate evaluation of models' reasoning abilities in out-of-distribution scenarios. Based on this concept, we propos
Externí odkaz:
http://arxiv.org/abs/2410.06526
Understanding visual semantics embedded in consecutive characters is a crucial capability for both large language models (LLMs) and multi-modal large language models (MLLMs). This type of artifact possesses the unique characteristic that identical in
Externí odkaz:
http://arxiv.org/abs/2410.01733
We introduce SimulBench, a benchmark designed to evaluate large language models (LLMs) across a diverse collection of creative simulation scenarios, such as acting as a Linux terminal or playing text games with users. While these simulation tasks ser
Externí odkaz:
http://arxiv.org/abs/2409.07641
Autor:
Zhu, King, Zang, Qianbo, Jia, Shian, Wu, Siwei, Fang, Feiteng, Li, Yizhi, Gavin, Shawn, Zheng, Tuney, Guo, Jiawei, Li, Bo, Wu, Haoning, Qu, Xingwei, Yang, Jian, Liu, Zachary, Yue, Xiang, Liu, J. H., Lin, Chenghua, Yang, Min, Ni, Shiwen, Huang, Wenhao, Zhang, Ge
Multimodal Large Language Models (MLLMs) are evaluated on various benchmarks, such as image captioning, visual question answering, and reasoning. However, many of these benchmarks include overly simple or uninformative samples, complicating the effec
Externí odkaz:
http://arxiv.org/abs/2409.06851