Výsledky vyhledávání - "CLARK, JACK A."

Report

Artificial Intelligence Index Report 2024

Autor: Maslej, Nestor, Fattorini, Loredana, Perrault, Raymond, Parli, Vanessa, Reuel, Anka, Brynjolfsson, Erik, Etchemendy, John, Ligett, Katrina, Lyons, Terah, Manyika, James, Niebles, Juan Carlos, Shoham, Yoav, Wald, Russell, Clark, Jack

The 2024 Index is our most comprehensive to date and arrives at an important moment when AI's influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical adv

Externí odkaz: http://arxiv.org/abs/2405.19522

Zobrazit plný text záznamu

Report

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy,

Externí odkaz: http://arxiv.org/abs/2401.05566

Zobrazit plný text záznamu

Report

Artificial Intelligence Index Report 2023

Autor: Maslej, Nestor, Fattorini, Loredana, Brynjolfsson, Erik, Etchemendy, John, Ligett, Katrina, Lyons, Terah, Manyika, James, Ngo, Helen, Niebles, Juan Carlos, Parli, Vanessa, Shoham, Yoav, Wald, Russell, Clark, Jack, Perrault, Raymond

Welcome to the sixth edition of the AI Index Report. This year, the report introduces more original data than any previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter, original analysis about l

Externí odkaz: http://arxiv.org/abs/2310.03715

Zobrazit plný text záznamu

Report

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Autor: Durmus, Esin, Nguyen, Karina, Liao, Thomas I., Schiefer, Nicholas, Askell, Amanda, Bakhtin, Anton, Chen, Carol, Hatfield-Dodds, Zac, Hernandez, Danny, Joseph, Nicholas, Lovitt, Liane, McCandlish, Sam, Sikder, Orowa, Tamkin, Alex, Thamkul, Janel, Kaplan, Jared, Clark, Jack, Ganguli, Deep

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dat

Externí odkaz: http://arxiv.org/abs/2306.16388

Zobrazit plný text záznamu

Report

Model evaluation for extreme risks

Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabiliti

Externí odkaz: http://arxiv.org/abs/2305.15324

Zobrazit plný text záznamu

Report

Regulatory Markets: The Future of AI Governance

Autor: Hadfield, Gillian K., Clark, Jack

Appropriately regulating artificial intelligence is an increasingly urgent policy challenge. Legislatures and regulators lack the specialized knowledge required to best translate public demands into legal requirements. Overreliance on industry self-r

Externí odkaz: http://arxiv.org/abs/2304.04914

Zobrazit plný text záznamu

Report

The Capacity for Moral Self-Correction in Large Language Models

We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in suppo

Externí odkaz: http://arxiv.org/abs/2302.07459

Zobrazit plný text záznamu

Report

Discovering Language Model Behaviors with Model-Written Evaluations

Autor: Perez, Ethan, Ringer, Sam, Lukošiūtė, Kamilė, Nguyen, Karina, Chen, Edwin, Heiner, Scott, Pettit, Craig, Olsson, Catherine, Kundu, Sandipan, Kadavath, Saurav, Jones, Andy, Chen, Anna, Mann, Ben, Israel, Brian, Seethor, Bryan, McKinnon, Cameron, Olah, Christopher, Yan, Da, Amodei, Daniela, Amodei, Dario, Drain, Dawn, Li, Dustin, Tran-Johnson, Eli, Khundadze, Guro, Kernion, Jackson, Landis, James, Kerr, Jamie, Mueller, Jared, Hyun, Jeeyoon, Landau, Joshua, Ndousse, Kamal, Goldberg, Landon, Lovitt, Liane, Lucas, Martin, Sellitto, Michael, Zhang, Miranda, Kingsland, Neerav, Elhage, Nelson, Joseph, Nicholas, Mercado, Noemí, DasSarma, Nova, Rausch, Oliver, Larson, Robin, McCandlish, Sam, Johnston, Scott, Kravec, Shauna, Showk, Sheer El, Lanham, Tamera, Telleen-Lawton, Timothy, Brown, Tom, Henighan, Tom, Hume, Tristan, Bai, Yuntao, Hatfield-Dodds, Zac, Clark, Jack, Bowman, Samuel R., Askell, Amanda, Grosse, Roger, Hernandez, Danny, Ganguli, Deep, Hubinger, Evan, Schiefer, Nicholas, Kaplan, Jared

As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which

Externí odkaz: http://arxiv.org/abs/2212.09251

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání