Zobrazeno 1 - 6
of 6
pro vyhledávání: '"Siththaranjan, Anand"'
Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static p
Externí odkaz:
http://arxiv.org/abs/2405.17713
Autonomous agents should be able to coordinate with other agents without knowing their intents ahead of time. While prior work has studied how agents can gather information about the intent of others, in this work, we study the inverse problem: how a
Externí odkaz:
http://arxiv.org/abs/2402.10182
In practice, preference learning from human feedback depends on incomplete data with hidden context. Hidden context refers to data that affects the feedback received, but which is not represented in the data used to train a preference model. This cap
Externí odkaz:
http://arxiv.org/abs/2312.08358
Autor:
Casper, Stephen, Davies, Xander, Shi, Claudia, Gilbert, Thomas Krendl, Scheurer, Jérémy, Rando, Javier, Freedman, Rachel, Korbak, Tomasz, Lindner, David, Freire, Pedro, Wang, Tony, Marks, Samuel, Segerie, Charbel-Raphaël, Carroll, Micah, Peng, Andi, Christoffersen, Phillip, Damani, Mehul, Slocum, Stewart, Anwar, Usman, Siththaranjan, Anand, Nadeau, Max, Michaud, Eric J., Pfau, Jacob, Krasheninnikov, Dmitrii, Chen, Xin, Langosco, Lauro, Hase, Peter, Bıyık, Erdem, Dragan, Anca, Krueger, David, Sadigh, Dorsa, Hadfield-Menell, Dylan
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there
Externí odkaz:
http://arxiv.org/abs/2307.15217
Autor:
Westenbroek, Tyler, Siththaranjan, Anand, Sarwari, Mohsin, Tomlin, Claire J., Sastry, Shankar S.
Optimal control is an essential tool for stabilizing complex nonlinear systems. However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a par
Externí odkaz:
http://arxiv.org/abs/2204.01986
Predictive human models often need to adapt their parameters online from human data. This raises previously ignored safety-related questions for robots relying on these models such as what the model could learn online and how quickly could it learn i
Externí odkaz:
http://arxiv.org/abs/2103.05746