Zobrazeno 1 - 10
of 79
pro vyhledávání: '"Zhu, Banghua"'
Autor:
Li, Tianle, Chiang, Wei-Lin, Frick, Evan, Dunlap, Lisa, Wu, Tianhao, Zhu, Banghua, Gonzalez, Joseph E., Stoica, Ion
The rapid evolution of language models has necessitated the development of more challenging benchmarks. Current static benchmarks often struggle to consistently distinguish between the capabilities of different models and fail to align with real-worl
Externí odkaz:
http://arxiv.org/abs/2406.11939
Let $\mathsf{TH}_k$ denote the $k$-out-of-$n$ threshold function: given $n$ input Boolean variables, the output is $1$ if and only if at least $k$ of the inputs are $1$. We consider the problem of computing the $\mathsf{TH}_k$ function using noisy re
Externí odkaz:
http://arxiv.org/abs/2403.07227
Autor:
Chiang, Wei-Lin, Zheng, Lianmin, Sheng, Ying, Angelopoulos, Anastasios Nikolas, Li, Tianle, Li, Dacheng, Zhang, Hao, Zhu, Banghua, Jordan, Michael, Gonzalez, Joseph E., Stoica, Ion
Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for evaluat
Externí odkaz:
http://arxiv.org/abs/2403.04132
Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny. This paper delves into the unique security challenges posed by Generative AI, and outlines potential research directions for managing th
Externí odkaz:
http://arxiv.org/abs/2402.12617
Large language models (LLMs) have achieved huge success in numerous natural language process (NLP) tasks. However, it faces the challenge of significant resource consumption during inference. In this paper, we aim to improve the inference efficiency
Externí odkaz:
http://arxiv.org/abs/2402.01173
Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns language models closely with human-centric values. The initial phase of RLHF involves learning human values using a reward model from ranking data. It is observed th
Externí odkaz:
http://arxiv.org/abs/2401.16335
Autor:
Sheng, Ying, Cao, Shiyi, Li, Dacheng, Zhu, Banghua, Li, Zhuohan, Zhuo, Danyang, Gonzalez, Joseph E., Stoica, Ion
High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading. To ensure that all client requests are processed fairly, most major LLM inference services have reque
Externí odkaz:
http://arxiv.org/abs/2401.00588
Publikováno v:
ICLR 2024 (Spotlight)
Reinforcement learning (RL) theory has largely focused on proving minimax sample complexity bounds. These require strategic exploration algorithms that use relatively limited function classes for representing the policy or value function. Our goal is
Externí odkaz:
http://arxiv.org/abs/2312.08369
Autor:
Huang, Baihe, Zhu, Hanlin, Zhu, Banghua, Ramchandran, Kannan, Jordan, Michael I., Lee, Jason D., Jiao, Jiantao
We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region,
Externí odkaz:
http://arxiv.org/abs/2312.07930
Autor:
Sheng, Ying, Cao, Shiyi, Li, Dacheng, Hooper, Coleman, Lee, Nicholas, Yang, Shuo, Chou, Christopher, Zhu, Banghua, Zheng, Lianmin, Keutzer, Kurt, Gonzalez, Joseph E., Stoica, Ion
The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in
Externí odkaz:
http://arxiv.org/abs/2311.03285