Výsledky vyhledávání

Report

A Single-Loop Finite-Time Convergent Policy Optimization Algorithm for Mean Field Games (and Average-Reward Markov Decision Processes)

Autor: Zeng, Sihan, Bhatt, Sujay, Koppel, Alec, Ganesh, Sumitra

We study the problem of finding an equilibrium of a mean field game (MFG) -- a policy performing optimally in a Markov decision process (MDP) determined by the induced mean field, where the mean field is a distribution over a population of agents and

Externí odkaz: http://arxiv.org/abs/2408.04780

Zobrazit plný text záznamu

Report

Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities

Autor: Shmidman, Shaltiel, Shmidman, Avi, Cohen, Amir DN, Koppel, Moshe

Training large language models (LLMs) in low-resource languages such as Hebrew poses unique challenges. In this paper, we introduce DictaLM2.0 and DictaLM2.0-Instruct, two LLMs derived from the Mistral model, trained on a substantial corpus of approx

Externí odkaz: http://arxiv.org/abs/2407.07080

Zobrazit plný text záznamu

Akademický článek

The ortho-para transition, confinement and self-diffusion of H2 in three distinct carbide-derived carbons by quasi- and inelastic neutron scattering

Autor: Härmas Riinu, Palm Rasmus, Koppel Miriam, Kalder Laura, Russina Margarita, Kurig Heisi, Härk Eneli, Aruväli Jaan, Tallo Indrek, Embs Jan P., Lust Enn

Publikováno v: EPJ Web of Conferences, Vol 286, p 05001 (2023)

Microporous carbon materials are promising for hydrogen storage due to their structural variety, high specific surface area, large pore volume and relatively low cost. Carbide-derived carbons are highly valued as model materials because their porous

Externí odkaz: https://doaj.org/article/6bffa6d2910140708a2445959cff1d77

Zobrazit plný text záznamu

Report

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Autor: Ding, Mucong, Chakraborty, Souradip, Agrawal, Vibhu, Che, Zora, Koppel, Alec, Wang, Mengdi, Bedi, Amrit, Huang, Furong

Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which

Externí odkaz: http://arxiv.org/abs/2406.15567

Zobrazit plný text záznamu

Report

Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

Autor: Zaman, Muhammad Aneeq uz, Laurière, Mathieu, Koppel, Alec, Başar, Tamer

In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochas

Externí odkaz: http://arxiv.org/abs/2406.13992

Zobrazit plný text záznamu

Report

Compressed Online Learning of Conditional Mean Embedding

Autor: Hou, Boya, Sanjari, Sina, Koppel, Alec, Bose, Subhonmesh

The conditional mean embedding (CME) encodes Markovian stochastic kernels through their actions on probability distributions embedded within the reproducing kernel Hilbert spaces (RKHS). The CME plays a key role in several well-known machine learning

Externí odkaz: http://arxiv.org/abs/2405.07432

Zobrazit plný text záznamu

Report

Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?

Autor: Shmidman, Avi, Shmidman, Cheyn Shmuel, Bareket, Dan, Koppel, Moshe, Tsarfaty, Reut

Publikováno v: In Proceedings of EACL 2023, 849-864 (2023)

Semitic morphologically-rich languages (MRLs) are characterized by extreme word ambiguity. Because most vowels are omitted in standard texts, many of the words are homographs with multiple possible analyses, each with a different pronunciation and di

Externí odkaz: http://arxiv.org/abs/2405.07099

Zobrazit plný text záznamu

Akademický článek

Exploring Nurse and Patient Experiences of Developing Rapport During Oncology Ambulatory Care Videoconferencing Visits: Protocol for a Qualitative Study

Autor: Koppel, Paula D, De Gagne, Jennie C

Publikováno v: JMIR Research Protocols, Vol 10, Iss 6, p e27940 (2021)

BackgroundTelehealth videoconferencing has largely been embraced by health care providers and patients during the COVID-19 pandemic; however, little is known about specific techniques for building rapport and provider-patient relationships in this ca

Externí odkaz: https://doaj.org/article/2d28e6c4605b4524a165edec97aa82c5

Zobrazit plný text záznamu

Report

Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

Autor: Patel, Bhrij, Suttle, Wesley A., Koppel, Alec, Aggarwal, Vaneet, Sadler, Brian M., Bedi, Amrit Singh, Manocha, Dinesh

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challeng

Externí odkaz: http://arxiv.org/abs/2403.11925

Zobrazit plný text záznamu

Report

Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

Autor: Zaman, Muhammad Aneeq uz, Koppel, Alec, Laurière, Mathieu, Başar, Tamer

We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably ach

Externí odkaz: http://arxiv.org/abs/2403.11345

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání