Zobrazeno 1 - 10
of 274
pro vyhledávání: '"Munos, R."'
We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the actions t
Externí odkaz:
http://arxiv.org/abs/1506.04782
Publikováno v:
International Conference on Machine Learning
International Conference on Machine Learning, Jul 2021, Vienna / Virtual, Austria
International Conference on Machine Learning, Jul 2021, Vienna / Virtual, Austria
In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has duri
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3f98f1f1f5160afc34988874a0e26114
https://hal.inria.fr/hal-03289295/document
https://hal.inria.fr/hal-03289295/document
Autor:
Kozuno, Tadashi, Tang, Yunhao, Rowland, Mark, Munos, R��mi, Kapturowski, Steven, Dabney, Will, Valko, Michal, Abel, David
Off-policy multi-step reinforcement learning algorithms consist of conservative and non-conservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithm
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::9a439e01ee6907a12731ec95b67d0833
Autor:
Mesnard, Thomas, Weber, Th��ophane, Viola, Fabio, Thakoor, Shantanu, Saade, Alaa, Harutyunyan, Anna, Dabney, Will, Stepleton, Tom, Heess, Nicolas, Guez, Arthur, Moulines, ��ric, Hutter, Marcus, Buesing, Lars, Munos, R��mi
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external fact
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::817906e26ce7fc8ab007293af0ae9371
http://arxiv.org/abs/2011.09464
http://arxiv.org/abs/2011.09464
Autor:
Gruslys, Audr��nas, Lanctot, Marc, Munos, R��mi, Timbers, Finbarr, Schmid, Martin, Perolat, Julien, Morrill, Dustin, Zambaldi, Vinicius, Lespiau, Jean-Baptiste, Schultz, John, Azar, Mohammad Gheshlaghi, Bowling, Michael, Tuyls, Karl
Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of pas
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ef7294a22de76f4ec622293b8679a0da
http://arxiv.org/abs/2008.12234
http://arxiv.org/abs/2008.12234
Publikováno v:
In Applied Numerical Mathematics 2006 56(9):1147-1162
Autor:
Rowland, Mark, Dadashi, Robert, Kumar, Saurabh, Munos, R��mi, Bellemare, Marc G., Dabney, Will
We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution. Our key insight is that DRL algorithms can be decomposed as t
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0019fd468b91857787b199ad94c22129
http://arxiv.org/abs/1902.08102
http://arxiv.org/abs/1902.08102
Autor:
Azar, Mohammad Gheshlaghi, Piot, Bilal, Pires, Bernardo Avila, Grill, Jean-Bastien, Altch��, Florent, Munos, R��mi
As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding of the world based on the new information---humans can discover their world. The outstanding
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3d2fbbf7765eff68c5103fe41a1ead4c
http://arxiv.org/abs/1902.07685
http://arxiv.org/abs/1902.07685
Autor:
Guo, Zhaohan Daniel, Azar, Mohammad Gheshlaghi, Piot, Bilal, Pires, Bernardo A., Munos, R��mi
Unsupervised representation learning has succeeded with excellent results in many applications. It is an especially powerful tool to learn a good representation of environments with partial or noisy observations. In partially observable domains it is
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f4f033aabada41d7c56275ff6ce39123
http://arxiv.org/abs/1811.06407
http://arxiv.org/abs/1811.06407
Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by t
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fda9e8cff9f6a0b6d135f5086598f98d
http://arxiv.org/abs/1802.08163
http://arxiv.org/abs/1802.08163