Výsledky vyhledávání - "A, Ghavamzadeh"

Report

Conservative Contextual Bandits: Beyond Linear Representations

Autor: Deb, Rohan, Ghavamzadeh, Mohammad, Banerjee, Arindam

Conservative Contextual Bandits (CCBs) address safety in sequential decision making by requiring that an agent's policy, along with minimizing regret, also satisfies a safety constraint: the performance is not worse than a baseline policy (e.g., the

Externí odkaz: http://arxiv.org/abs/2412.06165

Zobrazit plný text záznamu

Report

Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Autor: Hau, Jia Lin, Delage, Erick, Derman, Esther, Ghavamzadeh, Mohammad, Petrik, Marek

In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with st

Externí odkaz: http://arxiv.org/abs/2410.24128

Zobrazit plný text záznamu

Report

Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models

Autor: Kim, Kyuyoung, Jeong, Jongheon, An, Minyong, Ghavamzadeh, Mohammad, Dvijotham, Krishnamurthy, Shin, Jinwoo, Lee, Kimin

Fine-tuning text-to-image models with reward functions trained on human feedback data has proven effective for aligning model behavior with human intent. However, excessive optimization with such reward models, which serve as mere proxy objectives, c

Externí odkaz: http://arxiv.org/abs/2404.01863

Zobrazit plný text záznamu

Report

Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage

Autor: Panaganti, Kishan, Xu, Zaiyan, Kalathil, Dileep, Ghavamzadeh, Mohammad

The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration. One of the main challenges in offline RL is the distribution shift whi

Externí odkaz: http://arxiv.org/abs/2310.18434

Zobrazit plný text záznamu

Report

Preference Elicitation with Soft Attributes in Interactive Recommendation

Autor: Biyik, Erdem, Yao, Fan, Chow, Yinlam, Haig, Alex, Hsu, Chih-wei, Ghavamzadeh, Mohammad, Boutilier, Craig

Preference elicitation plays a central role in interactive recommender systems. Most preference elicitation approaches use either item queries that ask users to select preferred items from a slate, or attribute queries that ask them to express their

Externí odkaz: http://arxiv.org/abs/2311.02085

Zobrazit plný text záznamu

Report

Factual and Personalized Recommendations using Language Models and Reinforcement Learning

Autor: Jeong, Jihwan, Chow, Yinlam, Tennenholtz, Guy, Hsu, Chih-Wei, Tulepbergenov, Azamat, Ghavamzadeh, Mohammad, Boutilier, Craig

Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences. While traditional RSs rely on implicit user feedback signals, conversational RSs inter

Externí odkaz: http://arxiv.org/abs/2310.06176

Zobrazit plný text záznamu

Report

Bayesian Regret Minimization in Offline Bandits

Autor: Petrik, Marek, Tennenholtz, Guy, Ghavamzadeh, Mohammad

Publikováno v: International Conference on Machine Learning, 2024

We study how to make decisions that minimize Bayesian regret in offline linear bandits. Prior work suggests that one must take actions with maximum lower confidence bound (LCB) on their reward. We argue that the reliance on LCB is inherently flawed i

Externí odkaz: http://arxiv.org/abs/2306.01237

Zobrazit plný text záznamu

Report

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

Autor: Fan, Ying, Watkins, Olivia, Du, Yuqing, Liu, Hao, Ryu, Moonkyung, Boutilier, Craig, Abbeel, Pieter, Ghavamzadeh, Mohammad, Lee, Kangwook, Lee, Kimin

Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though

Externí odkaz: http://arxiv.org/abs/2305.16381

Zobrazit plný text záznamu

Report

Private and Communication-Efficient Algorithms for Entropy Estimation

Autor: Bravo-Hermsdorff, Gecia, Busa-Fekete, Róbert, Ghavamzadeh, Mohammad, Medina, Andres Muñoz, Syed, Umar

Modern statistical estimation is often performed in a distributed setting where each sample belongs to a single user who shares their data with a central server. Users are typically concerned with preserving the privacy of their samples, and also wit

Externí odkaz: http://arxiv.org/abs/2305.07751

Zobrazit plný text záznamu

Report

On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

Autor: Hau, Jia Lin, Delage, Erick, Ghavamzadeh, Mohammad, Petrik, Marek

Publikováno v: Advances in Neural Information Processing Systems (Neurips), 2023

Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the

Externí odkaz: http://arxiv.org/abs/2304.12477

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání