Výsledky vyhledávání

Report

Generalization Error of the Tilted Empirical Risk

Autor: Aminian, Gholamali, Asadi, Amir R., Li, Tian, Beirami, Ahmad, Reinert, Gesine, Cohen, Samuel N.

The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, Li et al. (2021) proposed the tilted empirical risk as a non-linear risk metr

Externí odkaz: http://arxiv.org/abs/2409.19431

Zobrazit plný text záznamu

Report

Inducing Group Fairness in LLM-Based Decisions

Autor: Atwood, James, Lahoti, Preethi, Balashankar, Ananth, Prost, Flavien, Beirami, Ahmad

Prompting Large Language Models (LLMs) has created new and interesting means for classifying textual data. While evaluating and remediating group fairness is a well-studied problem in classifier fairness literature, some classical approaches (e.g., r

Externí odkaz: http://arxiv.org/abs/2406.16738

Zobrazit plný text záznamu

Report

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Autor: Qi, Xiangyu, Panda, Ashwinee, Lyu, Kaifeng, Ma, Xiao, Roy, Subhrajit, Beirami, Ahmad, Mittal, Prateek, Henderson, Peter

The safety alignment of current Large Language Models (LLMs) is vulnerable. Relatively simple attacks, or even benign fine-tuning, can jailbreak aligned models. We argue that many of these vulnerabilities are related to a shared underlying issue: saf

Externí odkaz: http://arxiv.org/abs/2406.05946

Zobrazit plný text záznamu

Report

Robust Preference Optimization through Reward Model Distillation

Autor: Fisch, Adam, Eisenstein, Jacob, Zayats, Vicky, Agarwal, Alekh, Beirami, Ahmad, Nagpal, Chirag, Shaw, Pete, Berant, Jonathan

Language model (LM) post-training (or alignment) involves maximizing a reward function that is derived from preference annotations. Direct Preference Optimization (DPO) is a popular offline alignment method that trains a policy directly on preference

Externí odkaz: http://arxiv.org/abs/2405.19316

Zobrazit plný text záznamu

Report

Data-augmented phrase-level alignment for mitigating object hallucination

Autor: Sarkar, Pritam, Ebrahimi, Sayna, Etemad, Ali, Beirami, Ahmad, Arık, Sercan Ö., Pfister, Tomas

Despite their significant advancements, Multimodal Large Language Models (MLLMs) often generate factually inaccurate information, referred to as hallucination. In this work, we address object hallucinations in MLLMs, where information is generated ab

Externí odkaz: http://arxiv.org/abs/2405.18654

Zobrazit plný text záznamu

Report

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Autor: Wu, Zhaofeng, Balashankar, Ananth, Kim, Yoon, Eisenstein, Jacob, Beirami, Ahmad

Aligning language models (LMs) based on human-annotated preference data is a crucial step in obtaining practical and performant LM-based systems. However, multilingual human preference data are difficult to obtain at scale, making it challenging to e

Externí odkaz: http://arxiv.org/abs/2404.12318

Zobrazit plný text záznamu

Report

Asymptotics of Language Model Alignment

Autor: Yang, Joy Qiping, Salamatian, Salman, Sun, Ziteng, Suresh, Ananda Theertha, Beirami, Ahmad

Let $p$ denote a generative language model. Let $r$ denote a reward model that returns a scalar that captures the degree at which a draw from $p$ is preferred. The goal of language model alignment is to alter $p$ to a new distribution $\phi$ that res

Externí odkaz: http://arxiv.org/abs/2404.01730

Zobrazit plný text záznamu

Report

Block Verification Accelerates Speculative Decoding

Autor: Sun, Ziteng, Mendlovic, Uri, Leviathan, Yaniv, Aharoni, Asaf, Beirami, Ahmad, Ro, Jae Hun, Suresh, Ananda Theertha

Speculative decoding is an effective method for lossless acceleration of large language models during inference. It uses a fast model to draft a block of tokens which are then verified in parallel by the target model, and provides a guarantee that th

Externí odkaz: http://arxiv.org/abs/2403.10444

Zobrazit plný text záznamu

Report

Gradient-Based Language Model Red Teaming

Autor: Wichers, Nevan, Denison, Carson, Beirami, Ahmad

Red teaming is a common strategy for identifying weaknesses in generative language models (LMs), where adversarial prompts are produced that trigger an LM to generate unsafe responses. Red teaming is instrumental for both model alignment and evaluati

Externí odkaz: http://arxiv.org/abs/2401.16656

Zobrazit plný text záznamu

Report

Theoretical guarantees on the best-of-n alignment policy

Autor: Beirami, Ahmad, Agarwal, Alekh, Berant, Jonathan, D'Amour, Alexander, Eisenstein, Jacob, Nagpal, Chirag, Suresh, Ananda Theertha

A simple and effective method for the alignment of generative models is the best-of-$n$ policy, where $n$ samples are drawn from a base policy, and ranked based on a reward function, and the highest ranking one is selected. A commonly used analytical

Externí odkaz: http://arxiv.org/abs/2401.01879

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání