Výsledky vyhledávání - "Tarassov, Eugene"

Report

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Autor: Richemond, Pierre Harvey, Tang, Yunhao, Guo, Daniel, Calandriello, Daniele, Azar, Mohammad Gheshlaghi, Rafailov, Rafael, Pires, Bernardo Avila, Tarassov, Eugene, Spangher, Lucas, Ellsworth, Will, Severyn, Aliaksei, Mallinson, Jonathan, Shani, Lior, Shamir, Gil, Joshi, Rishabh, Liu, Tianqi, Munos, Remi, Piot, Bilal

The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is

Externí odkaz: http://arxiv.org/abs/2405.19107

Zobrazit plný text záznamu

Report

Understanding the performance gap between online and offline alignment algorithms

Autor: Tang, Yunhao, Guo, Daniel Zhaohan, Zheng, Zeyu, Calandriello, Daniele, Cao, Yuan, Tarassov, Eugene, Munos, Rémi, Pires, Bernardo Ávila, Valko, Michal, Cheng, Yong, Dabney, Will

Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of rewar

Externí odkaz: http://arxiv.org/abs/2405.08448

Zobrazit plný text záznamu

Report

Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d hu

Externí odkaz: http://arxiv.org/abs/2209.10958

Zobrazit plný text záznamu

Report

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet maste

Externí odkaz: http://arxiv.org/abs/2206.15378

Zobrazit plný text záznamu

Plný text ve formátu HTML

Report

Time-series Imputation of Temporally-occluded Multiagent Trajectories

Autor: Omidshafiei, Shayegan, Hennes, Daniel, Garnelo, Marta, Tarassov, Eugene, Wang, Zhe, Elie, Romuald, Connor, Jerome T., Muller, Paul, Graham, Ian, Spearman, William, Tuyls, Karl

In multiagent environments, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents' decision-making processes, ma

Externí odkaz: http://arxiv.org/abs/2106.04219

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání