Zobrazeno 1 - 10
of 101
pro vyhledávání: '"Sun, Zhiqing"'
The optimal training configurations of large language models (LLMs) with respect to model sizes and compute budgets have been extensively studied. But how to optimally configure LLMs during inference has not been explored in sufficient depth. We stud
Externí odkaz:
http://arxiv.org/abs/2408.00724
Traditional language model-based theorem proving assumes that by training on a sufficient amount of formal proof data, a model will learn to prove theorems. Our key observation is that a wealth of informal information that is not present in formal pr
Externí odkaz:
http://arxiv.org/abs/2407.10040
Autor:
Sun, Shenghuan, Goldgof, Gregory M., Schubert, Alexander, Sun, Zhiqing, Hartvigsen, Thomas, Butte, Atul J., Alaa, Ahmed
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions to assist in diagnostic and treatment tasks. However, VLMs often exhibit "hallucinogenic" behavior, generating textual outpu
Externí odkaz:
http://arxiv.org/abs/2405.19567
Autor:
Ma, Pingchuan, Wang, Tsun-Hsuan, Guo, Minghao, Sun, Zhiqing, Tenenbaum, Joshua B., Rus, Daniela, Gan, Chuang, Matusik, Wojciech
Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and ground
Externí odkaz:
http://arxiv.org/abs/2405.09783
Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that dir
Externí odkaz:
http://arxiv.org/abs/2405.00675
Autor:
Zhang, Ruohong, Gui, Liangke, Sun, Zhiqing, Feng, Yihao, Xu, Keyang, Zhang, Yuanhan, Fu, Di, Li, Chunyuan, Hauptmann, Alexander, Bisk, Yonatan, Yang, Yiming
Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM). However, in tasks involving video instruction-following, providing informative
Externí odkaz:
http://arxiv.org/abs/2404.01258
Autor:
Sun, Zhiqing, Yu, Longhui, Shen, Yikang, Liu, Weiyang, Yang, Yiming, Welleck, Sean, Gan, Chuang
Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can we keep i
Externí odkaz:
http://arxiv.org/abs/2403.09472
Hallucinations pose a significant challenge to the reliability of large language models (LLMs) in critical domains. Recent benchmarks designed to assess LLM hallucinations within conventional NLP tasks, such as knowledge-intensive question answering
Externí odkaz:
http://arxiv.org/abs/2403.04307
Autor:
Jiang, Zhengbao, Sun, Zhiqing, Shi, Weijia, Rodriguez, Pedro, Zhou, Chunting, Neubig, Graham, Lin, Xi Victoria, Yih, Wen-tau, Iyer, Srinivasan
In order for large language model (LLM)-based assistants to effectively adapt to evolving information needs, it must be possible to update their factual knowledge through continued training on new data. The standard recipe for doing so involves conti
Externí odkaz:
http://arxiv.org/abs/2402.12847
Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values. However, RLHF relies on a reward model that is trained with a limited amount of human preference data, which could le
Externí odkaz:
http://arxiv.org/abs/2401.16635