Výsledky vyhledávání - "RUSSELL, STUART"

Report

Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts

Autor: Feng, Jiahai, Russell, Stuart, Steinhardt, Jacob

Pretrained language models (LMs) can generalize to implications of facts that they are finetuned on. For example, if finetuned on ``John Doe lives in Tokyo," LMs can correctly answer ``What language do the people in John Doe's city speak?'' with ``Ja

Externí odkaz: http://arxiv.org/abs/2412.04614

Zobrazit plný text záznamu

Report

The Partially Observable Off-Switch Game

Autor: Garber, Andrew, Subramani, Rohan, Luu, Linus, Bedaywi, Mark, Russell, Stuart, Emmons, Scott

A wide variety of goals could cause an AI to disable its off switch because "you can't fetch the coffee if you're dead" (Russell 2019). Prior theoretical work on this shutdown problem assumes that humans know everything that AIs do. In practice, howe

Externí odkaz: http://arxiv.org/abs/2411.17749

Zobrazit plný text záznamu

Report

International Scientific Report on the Safety of Advanced AI (Interim Report)

This is the interim publication of the first International Scientific Report on the Safety of Advanced AI. The report synthesises the scientific understanding of general-purpose AI -- AI that can perform a wide variety of tasks -- with a focus on und

Externí odkaz: http://arxiv.org/abs/2412.05282

Zobrazit plný text záznamu

Report

Trajectory Improvement and Reward Learning from Comparative Language Feedback

Autor: Yang, Zhaojing, Jun, Miru, Tien, Jeremy, Russell, Stuart J., Dragan, Anca, Bıyık, Erdem

Learning from human feedback has gained traction in fields like robotics and natural language processing in recent years. While prior works mostly rely on human feedback in the form of comparisons, language is a preferable modality that provides more

Externí odkaz: http://arxiv.org/abs/2410.06401

Zobrazit plný text záznamu

Report

RL, but don't do anything I wouldn't do

Autor: Cohen, Michael K., Hutter, Marcus, Bengio, Yoshua, Russell, Stuart

In reinforcement learning, if the agent's reward differs from the designers' true utility, even only rarely, the state distribution resulting from the agent's policy can be very bad, in theory and in practice. When RL policies would devolve into unde

Externí odkaz: http://arxiv.org/abs/2410.06213

Zobrazit plný text záznamu

Report

BAMDP Shaping: a Unified Theoretical Framework for Intrinsic Motivation and Reward Shaping

Autor: Lidayan, Aly, Dennis, Michael, Russell, Stuart

Intrinsic motivation (IM) and reward shaping are common methods for guiding the exploration of reinforcement learning (RL) agents by adding pseudo-rewards. Designing these rewards is challenging, however, and they can counter-intuitively harm perform

Externí odkaz: http://arxiv.org/abs/2409.05358

Zobrazit plný text záznamu

Review

HUMAN COMPATIBLE: AI AND THE PROBLEM OF CONTROL Russell Stuart

Autor: Noël, Jean-Christophe

Publikováno v: Politique étrangère, 2020 Dec 01. 85(4), 202-203.

Externí odkaz: https://www.jstor.org/stable/48643822

Zobrazit plný text záznamu

Report

Monitoring Latent World States in Language Models with Propositional Probes

Autor: Feng, Jiahai, Russell, Stuart, Steinhardt, Jacob

Language models are susceptible to bias, sycophancy, backdoors, and other tendencies that lead to unfaithful responses to the input context. Interpreting internal states of language models could help monitor and correct unfaithful behavior. We hypoth

Externí odkaz: http://arxiv.org/abs/2406.19501

Zobrazit plný text záznamu

Report

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Autor: Jenner, Erik, Kapur, Shreyas, Georgiev, Vasil, Allen, Cameron, Emmons, Scott, Russell, Stuart

Do neural networks learn to implement algorithms such as look-ahead or search "in the wild"? Or do they rely purely on collections of simple heuristics? We present evidence of learned look-ahead in the policy network of Leela Chess Zero, the currentl

Externí odkaz: http://arxiv.org/abs/2406.00877

Zobrazit plný text záznamu

Report

Diffusion On Syntax Trees For Program Synthesis

Autor: Kapur, Shreyas, Jenner, Erik, Russell, Stuart

Large language models generate code one token at a time. Their autoregressive generation process lacks the feedback of observing the program's output. Training LLMs to suggest edits directly can be challenging due to the scarcity of rich edit data. T

Externí odkaz: http://arxiv.org/abs/2405.20519

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání