Výsledky vyhledávání

Report

Partially Observable Contextual Bandits with Linear Payoffs

Autor: Zeng, Sihan, Bhatt, Sujay, Koppel, Alec, Ganesh, Sumitra

The standard contextual bandit framework assumes fully observable and actionable contexts. In this work, we consider a new bandit setting with partially observable, correlated contexts and linear payoffs, motivated by the applications in finance wher

Externí odkaz: http://arxiv.org/abs/2409.11521

Zobrazit plný text záznamu

Report

A Single-Loop Finite-Time Convergent Policy Optimization Algorithm for Mean Field Games (and Average-Reward Markov Decision Processes)

Autor: Zeng, Sihan, Bhatt, Sujay, Koppel, Alec, Ganesh, Sumitra

We study the problem of finding an equilibrium of a mean field game (MFG) -- a policy performing optimally in a Markov decision process (MDP) determined by the induced mean field, where the mean field is a distribution over a population of agents and

Externí odkaz: http://arxiv.org/abs/2408.04780

Zobrazit plný text záznamu

Report

Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities

Autor: Shmidman, Shaltiel, Shmidman, Avi, Cohen, Amir DN, Koppel, Moshe

Training large language models (LLMs) in low-resource languages such as Hebrew poses unique challenges. In this paper, we introduce DictaLM2.0 and DictaLM2.0-Instruct, two LLMs derived from the Mistral model, trained on a substantial corpus of approx

Externí odkaz: http://arxiv.org/abs/2407.07080

Zobrazit plný text záznamu

Report

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Autor: Ding, Mucong, Chakraborty, Souradip, Agrawal, Vibhu, Che, Zora, Koppel, Alec, Wang, Mengdi, Bedi, Amrit, Huang, Furong

Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which

Externí odkaz: http://arxiv.org/abs/2406.15567

Zobrazit plný text záznamu

Report

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

Autor: Zaman, Muhammad Aneeq uz, Laurière, Mathieu, Koppel, Alec, Başar, Tamer

In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochas

Externí odkaz: http://arxiv.org/abs/2406.13992

Zobrazit plný text záznamu

Report

Compressed Online Learning of Conditional Mean Embedding

Autor: Hou, Boya, Sanjari, Sina, Koppel, Alec, Bose, Subhonmesh

The conditional mean embedding (CME) encodes Markovian stochastic kernels through their actions on probability distributions embedded within the reproducing kernel Hilbert spaces (RKHS). The CME plays a key role in several well-known machine learning

Externí odkaz: http://arxiv.org/abs/2405.07432

Zobrazit plný text záznamu

Report

Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?

Autor: Shmidman, Avi, Shmidman, Cheyn Shmuel, Bareket, Dan, Koppel, Moshe, Tsarfaty, Reut

Publikováno v: In Proceedings of EACL 2023, 849-864 (2023)

Semitic morphologically-rich languages (MRLs) are characterized by extreme word ambiguity. Because most vowels are omitted in standard texts, many of the words are homographs with multiple possible analyses, each with a different pronunciation and di

Externí odkaz: http://arxiv.org/abs/2405.07099

Zobrazit plný text záznamu

Report

Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

Autor: Patel, Bhrij, Suttle, Wesley A., Koppel, Alec, Aggarwal, Vaneet, Sadler, Brian M., Bedi, Amrit Singh, Manocha, Dinesh

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challeng

Externí odkaz: http://arxiv.org/abs/2403.11925

Zobrazit plný text záznamu

Report

Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

Autor: Zaman, Muhammad Aneeq uz, Koppel, Alec, Laurière, Mathieu, Başar, Tamer

We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably ach

Externí odkaz: http://arxiv.org/abs/2403.11345

Zobrazit plný text záznamu

Report

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Autor: Yu, Peihong, Mishra, Manav, Koppel, Alec, Busart, Carl, Narayan, Priya, Manocha, Dinesh, Bedi, Amrit, Tokekar, Pratap

Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent se

Externí odkaz: http://arxiv.org/abs/2403.08936

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání