Autor:	Bharti, Shubham, Cheng, Shiyun, Rho, Jihyun, Rao, Martina, Zhu, Xiaojin
Rok vydání:	2024
Předmět:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition
Druh dokumentu:	Working Paper
Popis:	We introduce CHARTOM, a visual theory-of-mind benchmark for multimodal large language models. CHARTOM consists of specially designed data visualizing charts. Given a chart, a language model needs to not only correctly comprehend the chart (the FACT question) but also judge if the chart will be misleading to a human reader (the MIND question). Both questions have significant societal benefits. We detail the construction of the CHARTOM benchmark including its calibration on human performance.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2408.14419 Zobrazit plný text záznamu View this record from Arxiv