Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Christoffersen, Phillip"'
As society deploys more and more sophisticated artificial intelligence (AI) agents, it will be increasingly necessary for such agents, while pursuing their own objectives, to coexist in common environments in the physical or digital worlds. This may
Externí odkaz:
https://hdl.handle.net/1721.1/153795
Autor:
Casper, Stephen, Davies, Xander, Shi, Claudia, Gilbert, Thomas Krendl, Scheurer, Jérémy, Rando, Javier, Freedman, Rachel, Korbak, Tomasz, Lindner, David, Freire, Pedro, Wang, Tony, Marks, Samuel, Segerie, Charbel-Raphaël, Carroll, Micah, Peng, Andi, Christoffersen, Phillip, Damani, Mehul, Slocum, Stewart, Anwar, Usman, Siththaranjan, Anand, Nadeau, Max, Michaud, Eric J., Pfau, Jacob, Krasheninnikov, Dmitrii, Chen, Xin, Langosco, Lauro, Hase, Peter, Bıyık, Erdem, Dragan, Anca, Krueger, David, Sadigh, Dorsa, Hadfield-Menell, Dylan
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there
Externí odkaz:
http://arxiv.org/abs/2307.15217
Many real-world reinforcement learning (RL) problems necessitate learning complex, temporally extended behavior that may only receive reward signal when the behavior is completed. If the reward-worthy behavior is known, it can be specified in terms o
Externí odkaz:
http://arxiv.org/abs/2301.02952
Multi-agent Reinforcement Learning (MARL) is a powerful tool for training autonomous agents acting independently in a common environment. However, it can lead to sub-optimal behavior when individual incentives and group incentives diverge. Humans are
Externí odkaz:
http://arxiv.org/abs/2208.10469
Autor:
Icarte, Rodrigo Toro, Valenzano, Richard, Klassen, Toryn Q., Christoffersen, Phillip, Farahmand, Amir-massoud, McIlraith, Sheila A.
Reinforcement Learning (RL) agents typically learn memoryless policies---policies that only consider the last observation when selecting actions. Learning memoryless policies is efficient and optimal in fully observable environments. However, some fo
Externí odkaz:
http://arxiv.org/abs/2010.01753
Publikováno v:
Autonomous Agents & Multi-Agent Systems; Dec2024, Vol. 38 Issue 2, p1-38, 38p
Multi-agent reinforcement learning (MARL) is a powerful tool for training automated systems acting independently in a common environment. However, it can lead to sub-optimal behavior when individual incentives and group incentives diverge. Humans are
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0f76d7046a4c3181bb52437897948148
http://arxiv.org/abs/2208.10469
http://arxiv.org/abs/2208.10469