Výsledky vyhledávání - "Sun, Zhiqing"

Report

An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Autor: Wu, Yangzhen, Sun, Zhiqing, Li, Shanda, Welleck, Sean, Yang, Yiming

The optimal training configurations of large language models (LLMs) with respect to model sizes and compute budgets have been extensively studied. But how to optimally configure LLMs during inference has not been explored in sufficient depth. We stud

Externí odkaz: http://arxiv.org/abs/2408.00724

Zobrazit plný text záznamu

Report

Lean-STaR: Learning to Interleave Thinking and Proving

Autor: Lin, Haohan, Sun, Zhiqing, Yang, Yiming, Welleck, Sean

Traditional language model-based theorem proving assumes that by training on a sufficient amount of formal proof data, a model will learn to prove theorems. Our key observation is that a wealth of informal information that is not present in formal pr

Externí odkaz: http://arxiv.org/abs/2407.10040

Zobrazit plný text záznamu

Report

Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding

Autor: Sun, Shenghuan, Goldgof, Gregory M., Schubert, Alexander, Sun, Zhiqing, Hartvigsen, Thomas, Butte, Atul J., Alaa, Ahmed

Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions to assist in diagnostic and treatment tasks. However, VLMs often exhibit "hallucinogenic" behavior, generating textual outpu

Externí odkaz: http://arxiv.org/abs/2405.19567

Zobrazit plný text záznamu

Report

LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

Autor: Ma, Pingchuan, Wang, Tsun-Hsuan, Guo, Minghao, Sun, Zhiqing, Tenenbaum, Joshua B., Rus, Daniela, Gan, Chuang, Matusik, Wojciech

Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and ground

Externí odkaz: http://arxiv.org/abs/2405.09783

Zobrazit plný text záznamu

Report

Self-Play Preference Optimization for Language Model Alignment

Autor: Wu, Yue, Sun, Zhiqing, Yuan, Huizhuo, Ji, Kaixuan, Yang, Yiming, Gu, Quanquan

Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that dir

Externí odkaz: http://arxiv.org/abs/2405.00675

Zobrazit plný text záznamu

Report

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Autor: Zhang, Ruohong, Gui, Liangke, Sun, Zhiqing, Feng, Yihao, Xu, Keyang, Zhang, Yuanhan, Fu, Di, Li, Chunyuan, Hauptmann, Alexander, Bisk, Yonatan, Yang, Yiming

Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM). However, in tasks involving video instruction-following, providing informative

Externí odkaz: http://arxiv.org/abs/2404.01258

Zobrazit plný text záznamu

Report

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Autor: Sun, Zhiqing, Yu, Longhui, Shen, Yikang, Liu, Weiyang, Yang, Yiming, Welleck, Sean, Gan, Chuang

Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can we keep i

Externí odkaz: http://arxiv.org/abs/2403.09472

Zobrazit plný text záznamu

Report

HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild

Autor: Zhu, Zhiying, Yang, Yiming, Sun, Zhiqing

Hallucinations pose a significant challenge to the reliability of large language models (LLMs) in critical domains. Recent benchmarks designed to assess LLM hallucinations within conventional NLP tasks, such as knowledge-intensive question answering

Externí odkaz: http://arxiv.org/abs/2403.04307

Zobrazit plný text záznamu

Report

Instruction-tuned Language Models are Better Knowledge Learners

Autor: Jiang, Zhengbao, Sun, Zhiqing, Shi, Weijia, Rodriguez, Pedro, Zhou, Chunting, Neubig, Graham, Lin, Xi Victoria, Yih, Wen-tau, Iyer, Srinivasan

In order for large language model (LLM)-based assistants to effectively adapt to evolving information needs, it must be possible to update their factual knowledge through continued training on new data. The standard recipe for doing so involves conti

Externí odkaz: http://arxiv.org/abs/2402.12847

Zobrazit plný text záznamu

Report

Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble

Autor: Zhang, Shun, Chen, Zhenfang, Chen, Sunli, Shen, Yikang, Sun, Zhiqing, Gan, Chuang

Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values. However, RLHF relies on a reward model that is trained with a limited amount of human preference data, which could le

Externí odkaz: http://arxiv.org/abs/2401.16635

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání