Výsledky vyhledávání

Report

Prover-Verifier Games improve legibility of LLM outputs

Autor: Kirchner, Jan Hendrik, Chen, Yining, Edwards, Harri, Leike, Jan, McAleese, Nat, Burda, Yuri

One way to increase confidence in the outputs of Large Language Models (LLMs) is to support them with reasoning that is clear and easy to check -- a property we call legibility. We study legibility in the context of solving grade-school math problems

Externí odkaz: http://arxiv.org/abs/2407.13692

Zobrazit plný text záznamu

Report

LLM Critics Help Catch LLM Bugs

Autor: McAleese, Nat, Pokorny, Rai Michael, Uribe, Juan Felipe Ceron, Nitishinskaya, Evgenia, Trebacz, Maja, Leike, Jan

Reinforcement learning from human feedback (RLHF) is fundamentally limited by the capacity of humans to correctly evaluate model output. To improve human evaluation ability and overcome that limitation this work trains "critic" models that help human

Externí odkaz: http://arxiv.org/abs/2407.00215

Zobrazit plný text záznamu

Report

Scaling and evaluating sparse autoencoders

Autor: Gao, Leo, la Tour, Tom Dupré, Tillman, Henk, Goh, Gabriel, Troll, Rajan, Radford, Alec, Sutskever, Ilya, Leike, Jan, Wu, Jeffrey

Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be

Externí odkaz: http://arxiv.org/abs/2406.04093

Zobrazit plný text záznamu

Report

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Autor: Wallace, Eric, Xiao, Kai, Leike, Reimar, Weng, Lilian, Heidecke, Johannes, Beutel, Alex

Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts. In this work, we argue that one of the primary vulnerabilities unde

Externí odkaz: http://arxiv.org/abs/2404.13208

Zobrazit plný text záznamu

Report

Re-Envisioning Numerical Information Field Theory (NIFTy.re): A Library for Gaussian Processes and Variational Inference

Autor: Edenhofer, Gordian, Frank, Philipp, Roth, Jakob, Leike, Reimar H., Guerdi, Massin, Scheel-Platz, Lukas I., Guardiani, Matteo, Eberle, Vincent, Westerkamp, Margret, Enßlin, Torsten A.

Publikováno v: Journal of Open Source Software, volume 9(98), year 2024, page 6593

Imaging is the process of transforming noisy, incomplete data into a space that humans can interpret. NIFTy is a Bayesian framework for imaging and has already successfully been applied to many fields in astrophysics. Previous design decisions held t

Externí odkaz: http://arxiv.org/abs/2402.16683

Zobrazit plný text záznamu

Report

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Autor: Burns, Collin, Izmailov, Pavel, Kirchner, Jan Hendrik, Baker, Bowen, Gao, Leo, Aschenbrenner, Leopold, Chen, Yining, Ecoffet, Adrien, Joglekar, Manas, Leike, Jan, Sutskever, Ilya, Wu, Jeff

Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outpu

Externí odkaz: http://arxiv.org/abs/2312.09390

Zobrazit plný text záznamu

Akademický článek

Discovery and characterization of novel potent non-covalent small molecule inhibitors targeting papain-like protease from SARS-CoV-2

Autor: Miao Zheng, Bo Feng, Yumin Zhang, Xin Liu, Na Zhao, Hui Liu, Zichao Xu, Xinheng He, Zhiyan Qu, Shizhao Chen, Zhidong Jiang, Xi Cheng, Hong Liu, Yi Zang, Linxiang Zhao, Jie Zheng, Leike Zhang, Jia Li, Yu Zhou

Publikováno v: Acta Pharmaceutica Sinica B, Vol 14, Iss 7, Pp 3286-3290 (2024)

Externí odkaz: https://doaj.org/article/0507dae3bcfe403cb289ec049acee92e

Zobrazit plný text záznamu

Report

Let's Verify Step by Step

Autor: Lightman, Hunter, Kosaraju, Vineet, Burda, Yura, Edwards, Harri, Baker, Bowen, Lee, Teddy, Leike, Jan, Schulman, John, Sutskever, Ilya, Cobbe, Karl

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either t

Externí odkaz: http://arxiv.org/abs/2305.20050

Zobrazit plný text záznamu

Report

GPT-4 Technical Report

Autor: OpenAI, Achiam, Josh, Adler, Steven, Agarwal, Sandhini, Ahmad, Lama, Akkaya, Ilge, Aleman, Florencia Leoni, Almeida, Diogo, Altenschmidt, Janko, Altman, Sam, Anadkat, Shyamal, Avila, Red, Babuschkin, Igor, Balaji, Suchir, Balcom, Valerie, Baltescu, Paul, Bao, Haiming, Bavarian, Mohammad, Belgum, Jeff, Bello, Irwan, Berdine, Jake, Bernadett-Shapiro, Gabriel, Berner, Christopher, Bogdonoff, Lenny, Boiko, Oleg, Boyd, Madelaine, Brakman, Anna-Luisa, Brockman, Greg, Brooks, Tim, Brundage, Miles, Button, Kevin, Cai, Trevor, Campbell, Rosie, Cann, Andrew, Carey, Brittany, Carlson, Chelsea, Carmichael, Rory, Chan, Brooke, Chang, Che, Chantzis, Fotis, Chen, Derek, Chen, Sully, Chen, Ruby, Chen, Jason, Chen, Mark, Chess, Ben, Cho, Chester, Chu, Casey, Chung, Hyung Won, Cummings, Dave, Currier, Jeremiah, Dai, Yunxing, Decareaux, Cory, Degry, Thomas, Deutsch, Noah, Deville, Damien, Dhar, Arka, Dohan, David, Dowling, Steve, Dunning, Sheila, Ecoffet, Adrien, Eleti, Atty, Eloundou, Tyna, Farhi, David, Fedus, Liam, Felix, Niko, Fishman, Simón Posada, Forte, Juston, Fulford, Isabella, Gao, Leo, Georges, Elie, Gibson, Christian, Goel, Vik, Gogineni, Tarun, Goh, Gabriel, Gontijo-Lopes, Rapha, Gordon, Jonathan, Grafstein, Morgan, Gray, Scott, Greene, Ryan, Gross, Joshua, Gu, Shixiang Shane, Guo, Yufei, Hallacy, Chris, Han, Jesse, Harris, Jeff, He, Yuchen, Heaton, Mike, Heidecke, Johannes, Hesse, Chris, Hickey, Alan, Hickey, Wade, Hoeschele, Peter, Houghton, Brandon, Hsu, Kenny, Hu, Shengli, Hu, Xin, Huizinga, Joost, Jain, Shantanu, Jain, Shawn, Jang, Joanne, Jiang, Angela, Jiang, Roger, Jin, Haozhun, Jin, Denny, Jomoto, Shino, Jonn, Billie, Jun, Heewoo, Kaftan, Tomer, Kaiser, Łukasz, Kamali, Ali, Kanitscheider, Ingmar, Keskar, Nitish Shirish, Khan, Tabarak, Kilpatrick, Logan, Kim, Jong Wook, Kim, Christina, Kim, Yongjik, Kirchner, Jan Hendrik, Kiros, Jamie, Knight, Matt, Kokotajlo, Daniel, Kondraciuk, Łukasz, Kondrich, Andrew, Konstantinidis, Aris, Kosic, Kyle, Krueger, Gretchen, Kuo, Vishal, Lampe, Michael, Lan, Ikai, Lee, Teddy, Leike, Jan, Leung, Jade, Levy, Daniel, Li, Chak Ming, Lim, Rachel, Lin, Molly, Lin, Stephanie, Litwin, Mateusz, Lopez, Theresa, Lowe, Ryan, Lue, Patricia, Makanju, Anna, Malfacini, Kim, Manning, Sam, Markov, Todor, Markovski, Yaniv, Martin, Bianca, Mayer, Katie, Mayne, Andrew, McGrew, Bob, McKinney, Scott Mayer, McLeavey, Christine, McMillan, Paul, McNeil, Jake, Medina, David, Mehta, Aalok, Menick, Jacob, Metz, Luke, Mishchenko, Andrey, Mishkin, Pamela, Monaco, Vinnie, Morikawa, Evan, Mossing, Daniel, Mu, Tong, Murati, Mira, Murk, Oleg, Mély, David, Nair, Ashvin, Nakano, Reiichiro, Nayak, Rajeev, Neelakantan, Arvind, Ngo, Richard, Noh, Hyeonwoo, Ouyang, Long, O'Keefe, Cullen, Pachocki, Jakub, Paino, Alex, Palermo, Joe, Pantuliano, Ashley, Parascandolo, Giambattista, Parish, Joel, Parparita, Emy, Passos, Alex, Pavlov, Mikhail, Peng, Andrew, Perelman, Adam, Peres, Filipe de Avila Belbute, Petrov, Michael, Pinto, Henrique Ponde de Oliveira, Michael, Pokorny, Pokrass, Michelle, Pong, Vitchyr H., Powell, Tolly, Power, Alethea, Power, Boris, Proehl, Elizabeth, Puri, Raul, Radford, Alec, Rae, Jack, Ramesh, Aditya, Raymond, Cameron, Real, Francis, Rimbach, Kendra, Ross, Carl, Rotsted, Bob, Roussez, Henri, Ryder, Nick, Saltarelli, Mario, Sanders, Ted, Santurkar, Shibani, Sastry, Girish, Schmidt, Heather, Schnurr, David, Schulman, John, Selsam, Daniel, Sheppard, Kyla, Sherbakov, Toki, Shieh, Jessica, Shoker, Sarah, Shyam, Pranav, Sidor, Szymon, Sigler, Eric, Simens, Maddie, Sitkin, Jordan, Slama, Katarina, Sohl, Ian, Sokolowsky, Benjamin, Song, Yang, Staudacher, Natalie, Such, Felipe Petroski, Summers, Natalie, Sutskever, Ilya, Tang, Jie, Tezak, Nikolas, Thompson, Madeleine B., Tillet, Phil, Tootoonchian, Amin, Tseng, Elizabeth, Tuggle, Preston, Turley, Nick, Tworek, Jerry, Uribe, Juan Felipe Cerón, Vallone, Andrea, Vijayvergiya, Arun, Voss, Chelsea, Wainwright, Carroll, Wang, Justin Jay, Wang, Alvin, Wang, Ben, Ward, Jonathan, Wei, Jason, Weinmann, CJ, Welihinda, Akila, Welinder, Peter, Weng, Jiayi, Weng, Lilian, Wiethoff, Matt, Willner, Dave, Winter, Clemens, Wolrich, Samuel, Wong, Hannah, Workman, Lauren, Wu, Sherwin, Wu, Jeff, Wu, Michael, Xiao, Kai, Xu, Tao, Yoo, Sarah, Yu, Kevin, Yuan, Qiming, Zaremba, Wojciech, Zellers, Rowan, Zhang, Chong, Zhang, Marvin, Zhao, Shengjia, Zheng, Tianhao, Zhuang, Juntang, Zhuk, William, Zoph, Barret

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various profes

Externí odkaz: http://arxiv.org/abs/2303.08774

Zobrazit plný text záznamu

Akademický článek

A parallel multi-objective optimization based on adaptive surrogate model for combined operation of multiple hydraulic facilities in water diversion project

Autor: Xiaolian Liu, Zirong Liu, Xiaopeng Hou, Yu Tian, Xueni Wang, Leike Zhang, Hao Wang

Publikováno v: Journal of Hydroinformatics, Vol 26, Iss 6, Pp 1351-1369 (2024)

In a complex pressurized water diversion project (WDP), the combined optimal operation of multiple hydraulic facilities is computationally expensive owing to the requirement of massive mathematical simulation model runs. A parallel multi-objective op

Externí odkaz: https://doaj.org/article/dfe341e32e0c41a4934a17f3181f7b34

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání