Zobrazeno 1 - 10
of 624
pro vyhledávání: '"Mażeika, P."'
Autor:
Bengio, Yoshua, Mindermann, Sören, Privitera, Daniel, Besiroglu, Tamay, Bommasani, Rishi, Casper, Stephen, Choi, Yejin, Goldfarb, Danielle, Heidari, Hoda, Khalatbari, Leila, Longpre, Shayne, Mavroudis, Vasilios, Mazeika, Mantas, Ng, Kwan Yee, Okolo, Chinasa T., Raji, Deborah, Skeadas, Theodora, Tramèr, Florian, Adekanmbi, Bayo, Christiano, Paul, Dalrymple, David, Dietterich, Thomas G., Felten, Edward, Fung, Pascale, Gourinchas, Pierre-Olivier, Jennings, Nick, Krause, Andreas, Liang, Percy, Ludermir, Teresa, Marda, Vidushi, Margetts, Helen, McDermid, John A., Narayanan, Arvind, Nelson, Alondra, Oh, Alice, Ramchurn, Gopal, Russell, Stuart, Schaake, Marietje, Song, Dawn, Soto, Alvaro, Tiedrich, Lee, Varoquaux, Gaël, Yao, Andrew, Zhang, Ya-Qin
This is the interim publication of the first International Scientific Report on the Safety of Advanced AI. The report synthesises the scientific understanding of general-purpose AI -- AI that can perform a wide variety of tasks -- with a focus on und
Externí odkaz:
http://arxiv.org/abs/2412.05282
Autor:
Tamirisa, Rishub, Bharathi, Bhrugu, Phan, Long, Zhou, Andy, Gatti, Alice, Suresh, Tarun, Lin, Maxwell, Wang, Justin, Wang, Rowan, Arel, Ron, Zou, Andy, Song, Dawn, Li, Bo, Hendrycks, Dan, Mazeika, Mantas
Rapid advances in the capabilities of large language models (LLMs) have raised widespread concerns regarding their potential for malicious use. Open-weight LLMs present unique challenges, as existing safeguards lack robustness to tampering attacks th
Externí odkaz:
http://arxiv.org/abs/2408.00761
Autor:
Ren, Richard, Basart, Steven, Khoja, Adam, Gatti, Alice, Phan, Long, Yin, Xuwang, Mazeika, Mantas, Pan, Alexander, Mukobi, Gabriel, Kim, Ryan H., Fitz, Stephen, Hendrycks, Dan
As artificial intelligence systems grow more powerful, there has been increasing interest in "AI safety" research to address emerging and future risks. However, the field of AI safety remains poorly defined and inconsistently measured, leading to con
Externí odkaz:
http://arxiv.org/abs/2407.21792
Autor:
Li, Nathaniel, Pan, Alexander, Gopal, Anjali, Yue, Summer, Berrios, Daniel, Gatti, Alice, Li, Justin D., Dombrowski, Ann-Kathrin, Goel, Shashwat, Phan, Long, Mukobi, Gabriel, Helm-Burger, Nathan, Lababidi, Rassin, Justen, Lennart, Liu, Andrew B., Chen, Michael, Barrass, Isabelle, Zhang, Oliver, Zhu, Xiaoyuan, Tamirisa, Rishub, Bharathi, Bhrugu, Khoja, Adam, Zhao, Zhenqi, Herbert-Voss, Ariel, Breuer, Cort B., Marks, Samuel, Patel, Oam, Zou, Andy, Mazeika, Mantas, Wang, Zifan, Oswal, Palash, Lin, Weiran, Hunt, Adam A., Tienken-Harder, Justin, Shih, Kevin Y., Talley, Kemper, Guan, John, Kaplan, Russell, Steneker, Ian, Campbell, David, Jokubaitis, Brad, Levinson, Alex, Wang, Jean, Qian, William, Karmakar, Kallol Krishna, Basart, Steven, Fitz, Stephen, Levine, Mindy, Kumaraguru, Ponnurangam, Tupakula, Uday, Varadharajan, Vijay, Wang, Ruoyu, Shoshitaishvili, Yan, Ba, Jimmy, Esvelt, Kevin M., Wang, Alexandr, Hendrycks, Dan
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government ins
Externí odkaz:
http://arxiv.org/abs/2403.03218
Autor:
Mazeika, Mantas, Phan, Long, Yin, Xuwang, Zou, Andy, Wang, Zifan, Mu, Norman, Sakhaee, Elham, Li, Nathaniel, Basart, Steven, Li, Bo, Forsyth, David, Hendrycks, Dan
Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To ad
Externí odkaz:
http://arxiv.org/abs/2402.04249
Autor:
Zou, Andy, Phan, Long, Chen, Sarah, Campbell, James, Guo, Phillip, Ren, Richard, Pan, Alexander, Yin, Xuwang, Mazeika, Mantas, Dombrowski, Ann-Kathrin, Goel, Shashwat, Li, Nathaniel, Byun, Michael J., Wang, Zifan, Mallen, Alex, Basart, Steven, Koyejo, Sanmi, Song, Dawn, Fredrikson, Matt, Kolter, J. Zico, Hendrycks, Dan
In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representatio
Externí odkaz:
http://arxiv.org/abs/2310.01405
Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose catastrophic risks. Although numerous risks have been
Externí odkaz:
http://arxiv.org/abs/2306.12001
Autor:
Wang, Boxin, Chen, Weixin, Pei, Hengzhi, Xie, Chulin, Kang, Mintong, Zhang, Chenhui, Xu, Chejian, Xiong, Zidi, Dutta, Ritik, Schaeffer, Rylan, Truong, Sang T., Arora, Simran, Mazeika, Mantas, Hendrycks, Dan, Lin, Zinan, Cheng, Yu, Koyejo, Sanmi, Song, Dawn, Li, Bo
Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, prac
Externí odkaz:
http://arxiv.org/abs/2306.11698
Autor:
Pashchenko, V., Bludov, O., Baltrunas, D., Mazeika, K., Motria, S., Glukhov, K., Vysochanskii, Yu.
Publikováno v:
Condensed Matter Physics, 2022, vol. 25, No. 4, 43701
The experimental studies of the paramagnetic-antiferromagnetic phase transition through M\"{o}ssbauer spectroscopy and measurements of temperature and field dependencies of magnetic susceptibility in the layered Cu$_{0.15}$Fe$_{0.85}$PS$_3$ crystal a
Externí odkaz:
http://arxiv.org/abs/2301.01338
Autor:
Mazeika, Mantas, Tang, Eric, Zou, Andy, Basart, Steven, Chan, Jun Shern, Song, Dawn, Forsyth, David, Steinhardt, Jacob, Hendrycks, Dan
In recent years, deep neural networks have demonstrated increasingly strong abilities to recognize objects and activities in videos. However, as video understanding becomes widely used in real-world applications, a key consideration is developing hum
Externí odkaz:
http://arxiv.org/abs/2210.10039