Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Ma, Ziqiao"'
Autor:
Zhang, Zheyuan, Hu, Fengyuan, Lee, Jayjun, Shi, Freda, Kordjamshidi, Parisa, Chai, Joyce, Ma, Ziqiao
Spatial expressions in situated communication can be ambiguous, as their meanings vary depending on the frames of reference (FoR) adopted by speakers and listeners. While spatial language understanding and reasoning by vision-language models (VLMs) h
Externí odkaz:
http://arxiv.org/abs/2410.17385
Autor:
Zhang, Yue, Ma, Ziqiao, Li, Jialu, Qiao, Yanyuan, Wang, Zun, Chai, Joyce, Wu, Qi, Bansal, Mohit, Kordjamshidi, Parisa
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for
Externí odkaz:
http://arxiv.org/abs/2407.07035
Autor:
Chen, Xuweiyi, Ma, Ziqiao, Zhang, Xuejun, Xu, Sihan, Qian, Shengyi, Yang, Jianing, Fouhey, David F., Chai, Joyce
Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather th
Externí odkaz:
http://arxiv.org/abs/2407.06192
Autor:
Shen, Hua, Knearem, Tiffany, Ghosh, Reshmi, Alkiek, Kenan, Krishna, Kundan, Liu, Yachuan, Ma, Ziqiao, Petridis, Savvas, Peng, Yi-Hao, Qiwei, Li, Rakshit, Sushrita, Si, Chenglei, Xie, Yutong, Bigham, Jeffrey P., Bentley, Frank, Chai, Joyce, Lipton, Zachary, Mei, Qiaozhu, Mihalcea, Rada, Terry, Michael, Yang, Diyi, Morris, Meredith Ringel, Resnick, Paul, Jurgens, David
Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clar
Externí odkaz:
http://arxiv.org/abs/2406.09264
Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, over-simplified, and fail to capture the complexity of real-world driving scenarios in h
Externí odkaz:
http://arxiv.org/abs/2406.03008
Humans are efficient language learners and inherently social creatures. Our language development is largely shaped by our social interactions, for example, the demonstration and feedback from caregivers. Contrary to human language learning, recent ad
Externí odkaz:
http://arxiv.org/abs/2405.13828
Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations t
Externí odkaz:
http://arxiv.org/abs/2402.16846
Despite recent advances in inversion-based editing, text-guided image manipulation remains challenging for diffusion models. The primary bottlenecks include 1) the time-consuming nature of the inversion process; 2) the struggle to balance consistency
Externí odkaz:
http://arxiv.org/abs/2312.04965
Large Language Models (LLMs) have generated considerable interest and debate regarding their potential emergence of Theory of Mind (ToM). Several recent inquiries reveal a lack of robust ToM in these models and pose a pressing demand to develop new b
Externí odkaz:
http://arxiv.org/abs/2310.19619
Diffusion models (DMs) have enabled breakthroughs in image synthesis tasks but lack an intuitive interface for consistent image-to-image (I2I) translation. Various methods have been explored to address this issue, including mask-based methods, attent
Externí odkaz:
http://arxiv.org/abs/2310.13165