Výsledky vyhledávání - "Siththaranjan, Anand"

Report

AI Alignment with Changing and Influenceable Reward Functions

Autor: Carroll, Micah, Foote, Davis, Siththaranjan, Anand, Russell, Stuart, Dragan, Anca

Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static p

Externí odkaz: http://arxiv.org/abs/2405.17713

Zobrazit plný text záznamu

Report

Intent Demonstration in General-Sum Dynamic Games via Iterative Linear-Quadratic Approximations

Autor: Li, Jingqi, Siththaranjan, Anand, Sojoudi, Somayeh, Tomlin, Claire, Bajcsy, Andrea

Autonomous agents should be able to coordinate with other agents without knowing their intents ahead of time. While prior work has studied how agents can gather information about the intent of others, in this work, we study the inverse problem: how a

Externí odkaz: http://arxiv.org/abs/2402.10182

Zobrazit plný text záznamu

Report

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

Autor: Siththaranjan, Anand, Laidlaw, Cassidy, Hadfield-Menell, Dylan

In practice, preference learning from human feedback depends on incomplete data with hidden context. Hidden context refers to data that affects the feedback received, but which is not represented in the data used to train a preference model. This cap

Externí odkaz: http://arxiv.org/abs/2312.08358

Zobrazit plný text záznamu

Report

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there

Externí odkaz: http://arxiv.org/abs/2307.15217

Zobrazit plný text záznamu

Report

On the Computational Consequences of Cost Function Design in Nonlinear Optimal Control

Autor: Westenbroek, Tyler, Siththaranjan, Anand, Sarwari, Mohsin, Tomlin, Claire J., Sastry, Shankar S.

Optimal control is an essential tool for stabilizing complex nonlinear systems. However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a par

Externí odkaz: http://arxiv.org/abs/2204.01986

Zobrazit plný text záznamu

Report

Analyzing Human Models that Adapt Online

Autor: Bajcsy, Andrea, Siththaranjan, Anand, Tomlin, Claire J., Dragan, Anca D.

Predictive human models often need to adapt their parameters online from human data. This raises previously ignored safety-related questions for robots relying on these models such as what the model could learn online and how quickly could it learn i

Externí odkaz: http://arxiv.org/abs/2103.05746

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání