Zobrazeno 1 - 10
of 89
pro vyhledávání: '"BOWMAN, SAM"'
Backdoors are hidden behaviors that are only triggered once an AI system has been deployed. Bad actors looking to create successful backdoors must design them to avoid activation during training and evaluation. Since data used in these stages often o
Externí odkaz:
http://arxiv.org/abs/2407.04108
Autor:
Ganguli, Deep, Lovitt, Liane, Kernion, Jackson, Askell, Amanda, Bai, Yuntao, Kadavath, Saurav, Mann, Ben, Perez, Ethan, Schiefer, Nicholas, Ndousse, Kamal, Jones, Andy, Bowman, Sam, Chen, Anna, Conerly, Tom, DasSarma, Nova, Drain, Dawn, Elhage, Nelson, El-Showk, Sheer, Fort, Stanislav, Hatfield-Dodds, Zac, Henighan, Tom, Hernandez, Danny, Hume, Tristan, Jacobson, Josh, Johnston, Scott, Kravec, Shauna, Olsson, Catherine, Ringer, Sam, Tran-Johnson, Eli, Amodei, Dario, Brown, Tom, Joseph, Nicholas, McCandlish, Sam, Olah, Chris, Kaplan, Jared, Clark, Jack
We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful outputs. We make three main contributions. First, we investigate scaling behaviors for red teaming
Externí odkaz:
http://arxiv.org/abs/2209.07858
Autor:
Kadavath, Saurav, Conerly, Tom, Askell, Amanda, Henighan, Tom, Drain, Dawn, Perez, Ethan, Schiefer, Nicholas, Hatfield-Dodds, Zac, DasSarma, Nova, Tran-Johnson, Eli, Johnston, Scott, El-Showk, Sheer, Jones, Andy, Elhage, Nelson, Hume, Tristan, Chen, Anna, Bai, Yuntao, Bowman, Sam, Fort, Stanislav, Ganguli, Deep, Hernandez, Danny, Jacobson, Josh, Kernion, Jackson, Kravec, Shauna, Lovitt, Liane, Ndousse, Kamal, Olsson, Catherine, Ringer, Sam, Amodei, Dario, Brown, Tom, Clark, Jack, Joseph, Nicholas, Mann, Ben, McCandlish, Sam, Olah, Chris, Kaplan, Jared
We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions
Externí odkaz:
http://arxiv.org/abs/2207.05221
Individuals on social media may reveal themselves to be in various states of crisis (e.g. suicide, self-harm, abuse, or eating disorders). Detecting crisis from social media text automatically and accurately can have profound consequences. However, d
Externí odkaz:
http://arxiv.org/abs/1705.09585
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Bowman, Sam1 (AUTHOR)
Publikováno v:
Policy. Spring2017, Vol. 33 Issue 3, p35-40. 6p.
Autor:
Bowman, Sam
Publikováno v:
Australian & New Zealand Grapegrower & Winemaker. Nov2021, Issue 694, p18-22. 4p.
Autor:
Bowman, Sam
Publikováno v:
Australian & New Zealand Grapegrower & Winemaker. May2021, Issue 688, p64-69. 4p.
Autor:
Bowman, Sam
Publikováno v:
Humanising Language Teaching; Apr2024, Vol. 26 Issue 2, pN.PAG-N.PAG, 1p
Autor:
Bowman, Sam
Publikováno v:
Australian & New Zealand Grapegrower & Winemaker. Oct2020, Issue 681, p26-30. 4p.