Zobrazeno 1 - 10
of 43
pro vyhledávání: '"Ganguli, Deep"'
Autor:
Benton, Joe, Wagner, Misha, Christiansen, Eric, Anil, Cem, Perez, Ethan, Srivastav, Jai, Durmus, Esin, Ganguli, Deep, Kravec, Shauna, Shlegeris, Buck, Kaplan, Jared, Karnofsky, Holden, Hubinger, Evan, Grosse, Roger, Bowman, Samuel R., Duvenaud, David
Sufficiently capable models could subvert human oversight and decision-making in important contexts. For example, in the context of AI development, models could covertly sabotage efforts to evaluate their own dangerous capabilities, to monitor their
Externí odkaz:
http://arxiv.org/abs/2410.21514
Autor:
Summerfield, Christopher, Argyle, Lisa, Bakker, Michiel, Collins, Teddy, Durmus, Esin, Eloundou, Tyna, Gabriel, Iason, Ganguli, Deep, Hackenburg, Kobi, Hadfield, Gillian, Hewitt, Luke, Huang, Saffron, Landemore, Helene, Marchal, Nahema, Ovadya, Aviv, Procaccia, Ariel, Risse, Mathias, Schneier, Bruce, Seger, Elizabeth, Siddarth, Divya, Sætra, Henrik Skaug, Tessler, MH, Botvinick, Matthew
Advanced AI systems capable of generating humanlike text and multimodal content are now widely available. In this paper, we discuss the impacts that generative artificial intelligence may have on democratic processes. We consider the consequences of
Externí odkaz:
http://arxiv.org/abs/2409.06729
Autor:
Huang, Saffron, Siddarth, Divya, Lovitt, Liane, Liao, Thomas I., Durmus, Esin, Tamkin, Alex, Ganguli, Deep
Publikováno v:
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1395-1417
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address th
Externí odkaz:
http://arxiv.org/abs/2406.07814
Autor:
Hubinger, Evan, Denison, Carson, Mu, Jesse, Lambert, Mike, Tong, Meg, MacDiarmid, Monte, Lanham, Tamera, Ziegler, Daniel M., Maxwell, Tim, Cheng, Newton, Jermyn, Adam, Askell, Amanda, Radhakrishnan, Ansh, Anil, Cem, Duvenaud, David, Ganguli, Deep, Barez, Fazl, Clark, Jack, Ndousse, Kamal, Sachan, Kshitij, Sellitto, Michael, Sharma, Mrinank, DasSarma, Nova, Grosse, Roger, Kravec, Shauna, Bai, Yuntao, Witten, Zachary, Favaro, Marina, Brauner, Jan, Karnofsky, Holden, Christiano, Paul, Bowman, Samuel R., Graham, Logan, Kaplan, Jared, Mindermann, Sören, Greenblatt, Ryan, Shlegeris, Buck, Schiefer, Nicholas, Perez, Ethan
Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy,
Externí odkaz:
http://arxiv.org/abs/2401.05566
Autor:
Tamkin, Alex, Askell, Amanda, Lovitt, Liane, Durmus, Esin, Joseph, Nicholas, Kravec, Shauna, Nguyen, Karina, Kaplan, Jared, Ganguli, Deep
As language models (LMs) advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination in such contexts raises ethical concerns, moti
Externí odkaz:
http://arxiv.org/abs/2312.03689
Autor:
Cooper, A. Feder, Lee, Katherine, Grimmelmann, James, Ippolito, Daphne, Callison-Burch, Christopher, Choquette-Choo, Christopher A., Mireshghallah, Niloofar, Brundage, Miles, Mimno, David, Choksi, Madiha Zahrah, Balkin, Jack M., Carlini, Nicholas, De Sa, Christopher, Frankle, Jonathan, Ganguli, Deep, Gipson, Bryant, Guadamuz, Andres, Harris, Swee Leng, Jacobs, Abigail Z., Joh, Elizabeth, Kamath, Gautam, Lemley, Mark, Matthews, Cass, McLeavey, Christine, McSherry, Corynne, Nasr, Milad, Ohm, Paul, Roberts, Adam, Rubin, Tom, Samuelson, Pamela, Schubert, Ludwig, Vaccaro, Kristen, Villa, Luis, Wu, Felix, Zeide, Elana
This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, an
Externí odkaz:
http://arxiv.org/abs/2311.06477
Autor:
Durmus, Esin, Nguyen, Karina, Liao, Thomas I., Schiefer, Nicholas, Askell, Amanda, Bakhtin, Anton, Chen, Carol, Hatfield-Dodds, Zac, Hernandez, Danny, Joseph, Nicholas, Lovitt, Liane, McCandlish, Sam, Sikder, Orowa, Tamkin, Alex, Thamkul, Janel, Kaplan, Jared, Clark, Jack, Ganguli, Deep
Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dat
Externí odkaz:
http://arxiv.org/abs/2306.16388
Autor:
Small, Christopher T., Vendrov, Ivan, Durmus, Esin, Homaei, Hadjar, Barry, Elizabeth, Cornebise, Julien, Suzman, Ted, Ganguli, Deep, Megill, Colin
Polis is a platform that leverages machine intelligence to scale up deliberative processes. In this paper, we explore the opportunities and risks associated with applying Large Language Models (LLMs) towards challenges with facilitating, moderating a
Externí odkaz:
http://arxiv.org/abs/2306.11932
Autor:
Ganguli, Deep, Askell, Amanda, Schiefer, Nicholas, Liao, Thomas I., Lukošiūtė, Kamilė, Chen, Anna, Goldie, Anna, Mirhoseini, Azalia, Olsson, Catherine, Hernandez, Danny, Drain, Dawn, Li, Dustin, Tran-Johnson, Eli, Perez, Ethan, Kernion, Jackson, Kerr, Jamie, Mueller, Jared, Landau, Joshua, Ndousse, Kamal, Nguyen, Karina, Lovitt, Liane, Sellitto, Michael, Elhage, Nelson, Mercado, Noemi, DasSarma, Nova, Rausch, Oliver, Lasenby, Robert, Larson, Robin, Ringer, Sam, Kundu, Sandipan, Kadavath, Saurav, Johnston, Scott, Kravec, Shauna, Showk, Sheer El, Lanham, Tamera, Telleen-Lawton, Timothy, Henighan, Tom, Hume, Tristan, Bai, Yuntao, Hatfield-Dodds, Zac, Mann, Ben, Amodei, Dario, Joseph, Nicholas, McCandlish, Sam, Brown, Tom, Olah, Christopher, Clark, Jack, Bowman, Samuel R., Kaplan, Jared
We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in suppo
Externí odkaz:
http://arxiv.org/abs/2302.07459
Autor:
Perez, Ethan, Ringer, Sam, Lukošiūtė, Kamilė, Nguyen, Karina, Chen, Edwin, Heiner, Scott, Pettit, Craig, Olsson, Catherine, Kundu, Sandipan, Kadavath, Saurav, Jones, Andy, Chen, Anna, Mann, Ben, Israel, Brian, Seethor, Bryan, McKinnon, Cameron, Olah, Christopher, Yan, Da, Amodei, Daniela, Amodei, Dario, Drain, Dawn, Li, Dustin, Tran-Johnson, Eli, Khundadze, Guro, Kernion, Jackson, Landis, James, Kerr, Jamie, Mueller, Jared, Hyun, Jeeyoon, Landau, Joshua, Ndousse, Kamal, Goldberg, Landon, Lovitt, Liane, Lucas, Martin, Sellitto, Michael, Zhang, Miranda, Kingsland, Neerav, Elhage, Nelson, Joseph, Nicholas, Mercado, Noemí, DasSarma, Nova, Rausch, Oliver, Larson, Robin, McCandlish, Sam, Johnston, Scott, Kravec, Shauna, Showk, Sheer El, Lanham, Tamera, Telleen-Lawton, Timothy, Brown, Tom, Henighan, Tom, Hume, Tristan, Bai, Yuntao, Hatfield-Dodds, Zac, Clark, Jack, Bowman, Samuel R., Askell, Amanda, Grosse, Roger, Hernandez, Danny, Ganguli, Deep, Hubinger, Evan, Schiefer, Nicholas, Kaplan, Jared
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which
Externí odkaz:
http://arxiv.org/abs/2212.09251