Zobrazeno 1 - 10
of 120
pro vyhledávání: '"Brauner, Jan"'
Autor:
Hubinger, Evan, Denison, Carson, Mu, Jesse, Lambert, Mike, Tong, Meg, MacDiarmid, Monte, Lanham, Tamera, Ziegler, Daniel M., Maxwell, Tim, Cheng, Newton, Jermyn, Adam, Askell, Amanda, Radhakrishnan, Ansh, Anil, Cem, Duvenaud, David, Ganguli, Deep, Barez, Fazl, Clark, Jack, Ndousse, Kamal, Sachan, Kshitij, Sellitto, Michael, Sharma, Mrinank, DasSarma, Nova, Grosse, Roger, Kravec, Shauna, Bai, Yuntao, Witten, Zachary, Favaro, Marina, Brauner, Jan, Karnofsky, Holden, Christiano, Paul, Bowman, Samuel R., Graham, Logan, Kaplan, Jared, Mindermann, Sören, Greenblatt, Ryan, Shlegeris, Buck, Schiefer, Nicholas, Perez, Ethan
Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy,
Externí odkaz:
http://arxiv.org/abs/2401.05566
Autor:
Grace, Katja, Stewart, Harlan, Sandkühler, Julia Fabienne, Thomas, Stephen, Weinstein-Raun, Ben, Brauner, Jan
In the largest survey of its kind, 2,778 researchers who had published in top-tier artificial intelligence (AI) venues gave predictions on the pace of AI progress and the nature and impacts of advanced AI systems The aggregate forecasts give at least
Externí odkaz:
http://arxiv.org/abs/2401.02843
Autor:
Bengio, Yoshua, Hinton, Geoffrey, Yao, Andrew, Song, Dawn, Abbeel, Pieter, Darrell, Trevor, Harari, Yuval Noah, Zhang, Ya-Qin, Xue, Lan, Shalev-Shwartz, Shai, Hadfield, Gillian, Clune, Jeff, Maharaj, Tegan, Hutter, Frank, Baydin, Atılım Güneş, McIlraith, Sheila, Gao, Qiqi, Acharya, Ashwin, Krueger, David, Dragan, Anca, Torr, Philip, Russell, Stuart, Kahneman, Daniel, Brauner, Jan, Mindermann, Sören
Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact,
Externí odkaz:
http://arxiv.org/abs/2310.17688
Autor:
Pacchiardi, Lorenzo, Chan, Alex J., Mindermann, Sören, Moscovitz, Ilan, Pan, Alexa Y., Gal, Yarin, Evans, Owain, Brauner, Jan
Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a demonstrable sense. LLMs might "lie", for example, when instructed to output misinformation. Here, we develop a simple lie detecto
Externí odkaz:
http://arxiv.org/abs/2309.15840
Autor:
Radhakrishnan, Ansh, Nguyen, Karina, Chen, Anna, Chen, Carol, Denison, Carson, Hernandez, Danny, Durmus, Esin, Hubinger, Evan, Kernion, Jackson, Lukošiūtė, Kamilė, Cheng, Newton, Joseph, Nicholas, Schiefer, Nicholas, Rausch, Oliver, McCandlish, Sam, Showk, Sheer El, Lanham, Tamera, Maxwell, Tim, Chandrasekaran, Venkatesa, Hatfield-Dodds, Zac, Kaplan, Jared, Brauner, Jan, Bowman, Samuel R., Perez, Ethan
As large language models (LLMs) perform more difficult tasks, it becomes harder to verify the correctness and safety of their behavior. One approach to help with this issue is to prompt LLMs to externalize their reasoning, e.g., by having them genera
Externí odkaz:
http://arxiv.org/abs/2307.11768
Autor:
Lanham, Tamera, Chen, Anna, Radhakrishnan, Ansh, Steiner, Benoit, Denison, Carson, Hernandez, Danny, Li, Dustin, Durmus, Esin, Hubinger, Evan, Kernion, Jackson, Lukošiūtė, Kamilė, Nguyen, Karina, Cheng, Newton, Joseph, Nicholas, Schiefer, Nicholas, Rausch, Oliver, Larson, Robin, McCandlish, Sam, Kundu, Sandipan, Kadavath, Saurav, Yang, Shannon, Henighan, Thomas, Maxwell, Timothy, Telleen-Lawton, Timothy, Hume, Tristan, Hatfield-Dodds, Zac, Kaplan, Jared, Brauner, Jan, Bowman, Samuel R., Perez, Ethan
Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its
Externí odkaz:
http://arxiv.org/abs/2307.13702
Autor:
Mindermann, Sören, Brauner, Jan, Razzak, Muhammed, Sharma, Mrinank, Kirsch, Andreas, Xu, Winnie, Höltgen, Benedikt, Gomez, Aidan N., Morisot, Adrien, Farquhar, Sebastian, Gal, Yarin
Training on web-scale data can take months. But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable. To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a simple but
Externí odkaz:
http://arxiv.org/abs/2206.07137
Publikováno v:
Nature Communications volume 13, Article number: 6793 (2022)
Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing
Externí odkaz:
http://arxiv.org/abs/2203.04592
When an image classifier outputs a wrong class label, it can be helpful to see what changes in the image would lead to a correct classification. This is the aim of algorithms generating counterfactual explanations. However, there is no easily scalabl
Externí odkaz:
http://arxiv.org/abs/2111.15639
Autor:
Mindermann, Sören, Razzak, Muhammed, Xu, Winnie, Kirsch, Andreas, Sharma, Mrinank, Morisot, Adrien, Gomez, Aidan N., Farquhar, Sebastian, Brauner, Jan, Gal, Yarin
Publikováno v:
ICML 2021 Workshop on Subset Selection in Machine Learning
We introduce Goldilocks Selection, a technique for faster model training which selects a sequence of training points that are "just right". We propose an information-theoretic acquisition function -- the reducible validation loss -- and compute it wi
Externí odkaz:
http://arxiv.org/abs/2107.02565