Zobrazeno 1 - 10
of 257
pro vyhledávání: '"Wang, William"'
Autor:
Wu, Qiucheng, Zhao, Handong, Saxon, Michael, Bui, Trung, Wang, William Yang, Zhang, Yang, Chang, Shiyu
Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However, the ways that these capabilities combine are not always intuitive and warr
Externí odkaz:
http://arxiv.org/abs/2407.01863
We present LoCoVQA, a dynamic benchmark generator for evaluating long-context extractive reasoning in vision language models (VLMs). LoCoVQA augments test examples for mathematical reasoning, VQA, and character recognition tasks with increasingly lon
Externí odkaz:
http://arxiv.org/abs/2406.16851
Large language models (LLMs) have shown remarkable performance on code generation tasks. A recent application of LLMs for code generation is iterative code repair, where a model fixes an incorrect program by rationalizing about errors and generating
Externí odkaz:
http://arxiv.org/abs/2406.14867
Autor:
Amayuelas, Alfonso, Yang, Xianjun, Antoniades, Antonis, Hua, Wenyue, Pan, Liangming, Wang, William
Large Language Models (LLMs) have shown exceptional results on current benchmarks when working individually. The advancement in their capabilities, along with a reduction in parameter size and inference times, has facilitated the use of these models
Externí odkaz:
http://arxiv.org/abs/2406.14711
Autor:
Wang, Danqing, Antoniades, Antonis, Luong, Kha-Dinh, Zhang, Edwin, Kosan, Mert, Li, Jiachen, Singh, Ambuj, Wang, William Yang, Li, Lei
Counterfactual explanations of Graph Neural Networks (GNNs) offer a powerful way to understand data that can naturally be represented by a graph structure. Furthermore, in many domains, it is highly desirable to derive data-driven global explanations
Externí odkaz:
http://arxiv.org/abs/2406.13869
Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets. While recent studies indicate that existing offline DAP methods
Externí odkaz:
http://arxiv.org/abs/2406.12168
Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions. To address this gap, we launched WildVision-Arena (WV-Arena), an online platform that collects huma
Externí odkaz:
http://arxiv.org/abs/2406.11069
Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluati
Externí odkaz:
http://arxiv.org/abs/2406.08656
Autor:
He, Xuehai, Feng, Weixi, Zheng, Kaizhi, Lu, Yujie, Zhu, Wanrong, Li, Jiachen, Fan, Yue, Wang, Jianfeng, Li, Linjie, Yang, Zhengyuan, Lin, Kevin, Wang, William Yang, Wang, Lijuan, Wang, Xin Eric
Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate ric
Externí odkaz:
http://arxiv.org/abs/2406.08407
We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that fit commonsense in real life, which we call Commonsense-T2I. Given two adversarial text prompts containing an identical se
Externí odkaz:
http://arxiv.org/abs/2406.07546