Zobrazeno 1 - 10
of 13 457
pro vyhledávání: '"Koppel A"'
The standard contextual bandit framework assumes fully observable and actionable contexts. In this work, we consider a new bandit setting with partially observable, correlated contexts and linear payoffs, motivated by the applications in finance wher
Externí odkaz:
http://arxiv.org/abs/2409.11521
We study the problem of finding an equilibrium of a mean field game (MFG) -- a policy performing optimally in a Markov decision process (MDP) determined by the induced mean field, where the mean field is a distribution over a population of agents and
Externí odkaz:
http://arxiv.org/abs/2408.04780
Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities
Training large language models (LLMs) in low-resource languages such as Hebrew poses unique challenges. In this paper, we introduce DictaLM2.0 and DictaLM2.0-Instruct, two LLMs derived from the Mistral model, trained on a substantial corpus of approx
Externí odkaz:
http://arxiv.org/abs/2407.07080
Autor:
Ding, Mucong, Chakraborty, Souradip, Agrawal, Vibhu, Che, Zora, Koppel, Alec, Wang, Mengdi, Bedi, Amrit, Huang, Furong
Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which
Externí odkaz:
http://arxiv.org/abs/2406.15567
In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochas
Externí odkaz:
http://arxiv.org/abs/2406.13992
The conditional mean embedding (CME) encodes Markovian stochastic kernels through their actions on probability distributions embedded within the reproducing kernel Hilbert spaces (RKHS). The CME plays a key role in several well-known machine learning
Externí odkaz:
http://arxiv.org/abs/2405.07432
Publikováno v:
In Proceedings of EACL 2023, 849-864 (2023)
Semitic morphologically-rich languages (MRLs) are characterized by extreme word ambiguity. Because most vowels are omitted in standard texts, many of the words are homographs with multiple possible analyses, each with a different pronunciation and di
Externí odkaz:
http://arxiv.org/abs/2405.07099
Autor:
Patel, Bhrij, Suttle, Wesley A., Koppel, Alec, Aggarwal, Vaneet, Sadler, Brian M., Bedi, Amrit Singh, Manocha, Dinesh
In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challeng
Externí odkaz:
http://arxiv.org/abs/2403.11925
We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably ach
Externí odkaz:
http://arxiv.org/abs/2403.11345
Autor:
Yu, Peihong, Mishra, Manav, Koppel, Alec, Busart, Carl, Narayan, Priya, Manocha, Dinesh, Bedi, Amrit, Tokekar, Pratap
Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent se
Externí odkaz:
http://arxiv.org/abs/2403.08936