Zobrazeno 1 - 10
of 1 036
pro vyhledávání: '"Lu, Yujie"'
Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions. To address this gap, we launched WildVision-Arena (WV-Arena), an online platform that collects huma
Externí odkaz:
http://arxiv.org/abs/2406.11069
Autor:
He, Xuehai, Feng, Weixi, Zheng, Kaizhi, Lu, Yujie, Zhu, Wanrong, Li, Jiachen, Fan, Yue, Wang, Jianfeng, Li, Linjie, Yang, Zhengyuan, Lin, Kevin, Wang, William Yang, Wang, Lijuan, Wang, Xin Eric
Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate ric
Externí odkaz:
http://arxiv.org/abs/2406.08407
We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that align with commonsense in real life, which we call Commonsense-T2I. Given two adversarial text prompts containing an ident
Externí odkaz:
http://arxiv.org/abs/2406.07546
The rapid progress in Multimodal Large Language Models (MLLMs) has significantly advanced their ability to process and understand complex visual and textual information. However, the integration of multiple images and extensive textual contexts remai
Externí odkaz:
http://arxiv.org/abs/2405.14213
Publikováno v:
Julius-Kühn-Archiv, Vol 463, Iss 1, Pp 239-245 (2018)
A number of remote sensing methods were developed and tested in commercial grain warehouses; probe pitfall traps attached to vacuum lines, surface pit fall traps equipped with video cameras and white boards on grain surface monitored with video camer
Externí odkaz:
https://doaj.org/article/5ca7828ba0f84360b237cf6468b03bc2
Autor:
Saxon, Michael, Jahara, Fatima, Khoshnoodi, Mahsa, Lu, Yujie, Sharma, Aditya, Wang, William Yang
With advances in the quality of text-to-image (T2I) models has come interest in benchmarking their prompt faithfulness-the semantic coherence of generated images to the prompts they were conditioned on. A variety of T2I faithfulness metrics have been
Externí odkaz:
http://arxiv.org/abs/2404.04251
Neural implicit representation of geometric shapes has witnessed considerable advancements in recent years. However, common distance field based implicit representations, specifically signed distance field (SDF) for watertight shapes or unsigned dist
Externí odkaz:
http://arxiv.org/abs/2403.01414
Publikováno v:
Julius-Kühn-Archiv, Vol 463, Iss 2, Pp 1043-1045 (2018)
The lesser grain borer, Rhyzopertha dominica is one of the serious cosmopolitan stored grain pests worldwide. High phosphine resistant R. dominica has been reported in several countries. The evolution of strong phosphine resistance is a major challen
Externí odkaz:
https://doaj.org/article/aadc5f99034947849778eb6852d51f61
Recent multimodal large language models (MLLMs) have shown promising instruction following capabilities on vision-language tasks. In this work, we introduce VISUAL MODALITY INSTRUCTION (VIM), and investigate how well multimodal models can understand
Externí odkaz:
http://arxiv.org/abs/2311.17647
Autor:
Zhang, Xinlu, Lu, Yujie, Wang, Weizhi, Yan, An, Yan, Jun, Qin, Lianke, Wang, Heng, Yan, Xifeng, Wang, William Yang, Petzold, Linda Ruth
Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details. Although GPT-4V has shown promising results in various multi-modal tasks,
Externí odkaz:
http://arxiv.org/abs/2311.01361