Zobrazeno 1 - 10
of 2 810
pro vyhledávání: '"Beirami A"'
Autor:
Aminian, Gholamali, Asadi, Amir R., Li, Tian, Beirami, Ahmad, Reinert, Gesine, Cohen, Samuel N.
The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, Li et al. (2021) proposed the tilted empirical risk as a non-linear risk metr
Externí odkaz:
http://arxiv.org/abs/2409.19431
Prompting Large Language Models (LLMs) has created new and interesting means for classifying textual data. While evaluating and remediating group fairness is a well-studied problem in classifier fairness literature, some classical approaches (e.g., r
Externí odkaz:
http://arxiv.org/abs/2406.16738
Autor:
Qi, Xiangyu, Panda, Ashwinee, Lyu, Kaifeng, Ma, Xiao, Roy, Subhrajit, Beirami, Ahmad, Mittal, Prateek, Henderson, Peter
The safety alignment of current Large Language Models (LLMs) is vulnerable. Relatively simple attacks, or even benign fine-tuning, can jailbreak aligned models. We argue that many of these vulnerabilities are related to a shared underlying issue: saf
Externí odkaz:
http://arxiv.org/abs/2406.05946
Autor:
Fisch, Adam, Eisenstein, Jacob, Zayats, Vicky, Agarwal, Alekh, Beirami, Ahmad, Nagpal, Chirag, Shaw, Pete, Berant, Jonathan
Language model (LM) post-training (or alignment) involves maximizing a reward function that is derived from preference annotations. Direct Preference Optimization (DPO) is a popular offline alignment method that trains a policy directly on preference
Externí odkaz:
http://arxiv.org/abs/2405.19316
Autor:
Sarkar, Pritam, Ebrahimi, Sayna, Etemad, Ali, Beirami, Ahmad, Arık, Sercan Ö., Pfister, Tomas
Despite their significant advancements, Multimodal Large Language Models (MLLMs) often generate factually inaccurate information, referred to as hallucination. In this work, we address object hallucinations in MLLMs, where information is generated ab
Externí odkaz:
http://arxiv.org/abs/2405.18654
Aligning language models (LMs) based on human-annotated preference data is a crucial step in obtaining practical and performant LM-based systems. However, multilingual human preference data are difficult to obtain at scale, making it challenging to e
Externí odkaz:
http://arxiv.org/abs/2404.12318
Let $p$ denote a generative language model. Let $r$ denote a reward model that returns a scalar that captures the degree at which a draw from $p$ is preferred. The goal of language model alignment is to alter $p$ to a new distribution $\phi$ that res
Externí odkaz:
http://arxiv.org/abs/2404.01730
Autor:
Sun, Ziteng, Mendlovic, Uri, Leviathan, Yaniv, Aharoni, Asaf, Beirami, Ahmad, Ro, Jae Hun, Suresh, Ananda Theertha
Speculative decoding is an effective method for lossless acceleration of large language models during inference. It uses a fast model to draft a block of tokens which are then verified in parallel by the target model, and provides a guarantee that th
Externí odkaz:
http://arxiv.org/abs/2403.10444
Red teaming is a common strategy for identifying weaknesses in generative language models (LMs), where adversarial prompts are produced that trigger an LM to generate unsafe responses. Red teaming is instrumental for both model alignment and evaluati
Externí odkaz:
http://arxiv.org/abs/2401.16656
Autor:
Beirami, Ahmad, Agarwal, Alekh, Berant, Jonathan, D'Amour, Alexander, Eisenstein, Jacob, Nagpal, Chirag, Suresh, Ananda Theertha
A simple and effective method for the alignment of generative models is the best-of-$n$ policy, where $n$ samples are drawn from a base policy, and ranked based on a reward function, and the highest ranking one is selected. A commonly used analytical
Externí odkaz:
http://arxiv.org/abs/2401.01879