Zobrazeno 1 - 10
of 108
pro vyhledávání: '"Pires, Bernardo A."'
Autor:
Khetarpal, Khimya, Guo, Zhaohan Daniel, Pires, Bernardo Avila, Tang, Yunhao, Lyle, Clare, Rowland, Mark, Heess, Nicolas, Borsa, Diana, Guez, Arthur, Dabney, Will
Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYO
Externí odkaz:
http://arxiv.org/abs/2406.02035
Autor:
Richemond, Pierre Harvey, Tang, Yunhao, Guo, Daniel, Calandriello, Daniele, Azar, Mohammad Gheshlaghi, Rafailov, Rafael, Pires, Bernardo Avila, Tarassov, Eugene, Spangher, Lucas, Ellsworth, Will, Severyn, Aliaksei, Mallinson, Jonathan, Shani, Lior, Shamir, Gil, Joshi, Rishabh, Liu, Tianqi, Munos, Remi, Piot, Bilal
The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is
Externí odkaz:
http://arxiv.org/abs/2405.19107
Autor:
Tang, Yunhao, Guo, Daniel Zhaohan, Zheng, Zeyu, Calandriello, Daniele, Cao, Yuan, Tarassov, Eugene, Munos, Rémi, Pires, Bernardo Ávila, Valko, Michal, Cheng, Yong, Dabney, Will
Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of rewar
Externí odkaz:
http://arxiv.org/abs/2405.08448
Autor:
Calandriello, Daniele, Guo, Daniel, Munos, Remi, Rowland, Mark, Tang, Yunhao, Pires, Bernardo Avila, Richemond, Pierre Harvey, Lan, Charline Le, Valko, Michal, Liu, Tianqi, Joshi, Rishabh, Zheng, Zeyu, Piot, Bilal
Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learnin
Externí odkaz:
http://arxiv.org/abs/2403.08635
We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($\lambda$) does not apply importance sampling for off-policy learning, which introduces
Externí odkaz:
http://arxiv.org/abs/2402.05766
Autor:
Tang, Yunhao, Guo, Zhaohan Daniel, Zheng, Zeyu, Calandriello, Daniele, Munos, Rémi, Rowland, Mark, Richemond, Pierre Harvey, Valko, Michal, Pires, Bernardo Ávila, Piot, Bilal
Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices. We propose generalized preference optimization (GPO), a family of offline losses parameterized by a ge
Externí odkaz:
http://arxiv.org/abs/2402.05749
Autor:
Chen, Zhaoxi, Moon, Gyeongsik, Guo, Kaiwen, Cao, Chen, Pidhorskyi, Stanislav, Simon, Tomas, Joshi, Rohan, Dong, Yuan, Xu, Yichen, Pires, Bernardo, Wen, He, Evans, Lucas, Peng, Bo, Buffalini, Julia, Trimble, Autumn, McPhail, Kevyn, Schoeller, Melissa, Yu, Shoou-I, Romero, Javier, Zollhöfer, Michael, Sheikh, Yaser, Liu, Ziwei, Saito, Shunsuke
Existing photorealistic relightable hand models require extensive identity-specific observations in different views, poses, and illuminations, and face challenges in generalizing to natural illuminations and novel identities. To bridge this gap, we p
Externí odkaz:
http://arxiv.org/abs/2401.05334
Autor:
Tang, Yunhao, Kozuno, Tadashi, Rowland, Mark, Harutyunyan, Anna, Munos, Rémi, Pires, Bernardo Ávila, Valko, Michal
Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings. However, in the optimal control case, the impact of multi-step learning has been relatively limited despite a number of prior effort
Externí odkaz:
http://arxiv.org/abs/2305.18501
Autor:
Lyle, Clare, Zheng, Zeyu, Nikishin, Evgenii, Pires, Bernardo Avila, Pascanu, Razvan, Dabney, Will
Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness of deep reinforcement learning systems. Deep neural networks are known to lose plasticity o
Externí odkaz:
http://arxiv.org/abs/2303.01486
Autor:
Pires, Bernardo Avila, Behbahani, Feryal, Soyer, Hubert, Nikiforou, Kyriacos, Keck, Thomas, Singh, Satinder
Hierarchical Reinforcement Learning (HRL) agents have the potential to demonstrate appealing capabilities such as planning and exploration with abstraction, transfer, and skill reuse. Recent successes with HRL across different domains provide evidenc
Externí odkaz:
http://arxiv.org/abs/2302.14451