Výsledky vyhledávání

Report

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities

Autor: Zhang, Zheyuan, Hu, Fengyuan, Lee, Jayjun, Shi, Freda, Kordjamshidi, Parisa, Chai, Joyce, Ma, Ziqiao

Spatial expressions in situated communication can be ambiguous, as their meanings vary depending on the frames of reference (FoR) adopted by speakers and listeners. While spatial language understanding and reasoning by vision-language models (VLMs) h

Externí odkaz: http://arxiv.org/abs/2410.17385

Zobrazit plný text záznamu

Report

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Autor: Zhang, Yue, Ma, Ziqiao, Li, Jialu, Qiao, Yanyuan, Wang, Zun, Chai, Joyce, Wu, Qi, Bansal, Mohit, Kordjamshidi, Parisa

Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for

Externí odkaz: http://arxiv.org/abs/2407.07035

Zobrazit plný text záznamu

Report

Multi-Object Hallucination in Vision-Language Models

Autor: Chen, Xuweiyi, Ma, Ziqiao, Zhang, Xuejun, Xu, Sihan, Qian, Shengyi, Yang, Jianing, Fouhey, David F., Chai, Joyce

Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather th

Externí odkaz: http://arxiv.org/abs/2407.06192

Zobrazit plný text záznamu

Report

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clar

Externí odkaz: http://arxiv.org/abs/2406.09264

Zobrazit plný text záznamu

Report

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Autor: Huang, Yidong, Sansom, Jacob, Ma, Ziqiao, Gervits, Felix, Chai, Joyce

Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, over-simplified, and fail to capture the complexity of real-world driving scenarios in h

Externí odkaz: http://arxiv.org/abs/2406.03008

Zobrazit plný text záznamu

Report

Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

Autor: Ma, Ziqiao, Wang, Zekun, Chai, Joyce

Humans are efficient language learners and inherently social creatures. Our language development is largely shaped by our social interactions, for example, the demonstration and feedback from caregivers. Contrary to human language learning, recent ad

Externí odkaz: http://arxiv.org/abs/2405.13828

Zobrazit plný text záznamu

Report

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

Autor: Zhang, Yichi, Ma, Ziqiao, Gao, Xiaofeng, Shakiah, Suhaila, Gao, Qiaozi, Chai, Joyce

Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations t

Externí odkaz: http://arxiv.org/abs/2402.16846

Zobrazit plný text záznamu

Report

Inversion-Free Image Editing with Natural Language

Autor: Xu, Sihan, Huang, Yidong, Pan, Jiayi, Ma, Ziqiao, Chai, Joyce

Despite recent advances in inversion-based editing, text-guided image manipulation remains challenging for diffusion models. The primary bottlenecks include 1) the time-consuming nature of the inversion process; 2) the struggle to balance consistency

Externí odkaz: http://arxiv.org/abs/2312.04965

Zobrazit plný text záznamu

Report

Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models

Autor: Ma, Ziqiao, Sansom, Jacob, Peng, Run, Chai, Joyce

Large Language Models (LLMs) have generated considerable interest and debate regarding their potential emergence of Theory of Mind (ToM). Several recent inquiries reveal a lack of robust ToM in these models and pose a pressing demand to develop new b

Externí odkaz: http://arxiv.org/abs/2310.19619

Zobrazit plný text záznamu

Report

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

Autor: Xu, Sihan, Ma, Ziqiao, Huang, Yidong, Lee, Honglak, Chai, Joyce

Diffusion models (DMs) have enabled breakthroughs in image synthesis tasks but lack an intuitive interface for consistent image-to-image (I2I) translation. Various methods have been explored to address this issue, including mask-based methods, attent

Externí odkaz: http://arxiv.org/abs/2310.13165

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání