Zobrazeno 1 - 10
of 43
pro vyhledávání: '"He, Xuehai"'
Autor:
Zheng, Kaizhi, Chen, Xiaotong, He, Xuehai, Gu, Jing, Li, Linjie, Yang, Zhengyuan, Lin, Kevin, Wang, Jianfeng, Wang, Lijuan, Wang, Xin Eric
Given the steep learning curve of professional 3D software and the time-consuming process of managing large 3D assets, language-guided 3D scene editing has significant potential in fields such as virtual reality, augmented reality, and gaming. Howeve
Externí odkaz:
http://arxiv.org/abs/2410.12836
Autor:
He, Xuehai, Feng, Weixi, Zheng, Kaizhi, Lu, Yujie, Zhu, Wanrong, Li, Jiachen, Fan, Yue, Wang, Jianfeng, Li, Linjie, Yang, Zhengyuan, Lin, Kevin, Wang, William Yang, Wang, Lijuan, Wang, Xin Eric
Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate ric
Externí odkaz:
http://arxiv.org/abs/2406.08407
Large Multimodal Models (LMMs) have shown remarkable progress in medical Visual Question Answering (Med-VQA), achieving high accuracy on existing benchmarks. However, their reliability under robust evaluation is questionable. This study reveals that
Externí odkaz:
http://arxiv.org/abs/2405.20421
Autor:
He, Xuehai, Zheng, Jian, Fang, Jacob Zhiyuan, Piramuthu, Robinson, Bansal, Mohit, Ordonez, Vicente, Sigurdsson, Gunnar A, Peng, Nanyun, Wang, Xin Eric
Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps. Nevertheless, current controllable T2I methods commonly face challenges related to efficiency a
Externí odkaz:
http://arxiv.org/abs/2405.04834
Autor:
Li, Jiachen, Gao, Qiaozi, Johnston, Michael, Gao, Xiaofeng, He, Xuehai, Shakiah, Suhaila, Shi, Hangjie, Ghanadan, Reza, Wang, William Yang
Prompt-based learning has been demonstrated as a compelling paradigm contributing to large language models' tremendous success (LLMs). Inspired by their success in language tasks, existing research has leveraged LLMs in embodied instruction following
Externí odkaz:
http://arxiv.org/abs/2310.09676
The effectiveness of Multimodal Large Language Models (MLLMs) demonstrates a profound capability in multimodal understanding. However, the simultaneous generation of images with coherent texts is still underdeveloped. Addressing this, we introduce a
Externí odkaz:
http://arxiv.org/abs/2310.02239
Autor:
Feng, Weixi, Zhu, Wanrong, Fu, Tsu-jui, Jampani, Varun, Akula, Arjun, He, Xuehai, Basu, Sugato, Wang, Xin Eric, Wang, William Yang
Attaining a high degree of user controllability in visual generation often requires intricate, fine-grained inputs like layouts. However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we s
Externí odkaz:
http://arxiv.org/abs/2305.15393
Autor:
He, Xuehai, Feng, Weixi, Fu, Tsu-Jui, Jampani, Varun, Akula, Arjun, Narayana, Pradyumna, Basu, Sugato, Wang, William Yang, Wang, Xin Eric
Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text pro
Externí odkaz:
http://arxiv.org/abs/2305.10722
Autor:
He, Xuehai, Wang, Xin Eric
Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly. On the other hand, structured learning approaches such as graph neural
Externí odkaz:
http://arxiv.org/abs/2305.00581
Autor:
Feng, Weixi, He, Xuehai, Fu, Tsu-Jui, Jampani, Varun, Akula, Arjun, Narayana, Pradyumna, Basu, Sugato, Wang, Xin Eric, Wang, William Yang
Large-scale diffusion models have achieved state-of-the-art results on text-to-image synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we observe that attribution-binding and compositional capabilities are sti
Externí odkaz:
http://arxiv.org/abs/2212.05032