Zobrazeno 1 - 10
of 3 135
pro vyhledávání: '"Mallén, A."'
Subversion Strategy Eval: Evaluating AI's stateless strategic capabilities against control protocols
AI control protocols are plans for usefully deploying AI systems in a way that is safe, even if the AI intends to subvert the protocol. Previous work evaluated protocols by subverting them with a human-AI red team, where an AI follows the human-writt
Externí odkaz:
http://arxiv.org/abs/2412.12480
While the activations of neurons in deep neural networks usually do not have a simple human-understandable interpretation, sparse autoencoders (SAEs) can be used to transform these activations into a higher-dimensional latent space which may be more
Externí odkaz:
http://arxiv.org/abs/2410.13928
Autor:
Mallen, Alex, Belrose, Nora
Scalable oversight studies methods of training and evaluating AI systems in domains where human judgment is unreliable or expensive, such as scientific research and software engineering in complex codebases. Most work in this area has focused on meth
Externí odkaz:
http://arxiv.org/abs/2410.13215
The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that n
Externí odkaz:
http://arxiv.org/abs/2402.04362
Autor:
Aceto, Luca, Fábregas, Ignacio, García-Pérez, Álvaro, Ingólfsdóttir, Anna, Ortega-Mallén, Yolanda
The nominal transition systems (NTSs) of Parrow et al. describe the operational semantics of nominal process calculi. We study NTSs in terms of the nominal residual transition systems (NRTSs) that we introduce. We provide rule formats for the specifi
Externí odkaz:
http://arxiv.org/abs/2402.00982
Eliciting Latent Knowledge (ELK) aims to find patterns in a capable neural network's activations that robustly track the true state of the world, especially in hard-to-verify cases where the model's output is untrusted. To further ELK research, we in
Externí odkaz:
http://arxiv.org/abs/2312.01037
Autor:
Zou, Andy, Phan, Long, Chen, Sarah, Campbell, James, Guo, Phillip, Ren, Richard, Pan, Alexander, Yin, Xuwang, Mazeika, Mantas, Dombrowski, Ann-Kathrin, Goel, Shashwat, Li, Nathaniel, Byun, Michael J., Wang, Zifan, Mallen, Alex, Basart, Steven, Koyejo, Sanmi, Song, Dawn, Fredrikson, Matt, Kolter, J. Zico, Hendrycks, Dan
In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representatio
Externí odkaz:
http://arxiv.org/abs/2310.01405
Lessons from community engagement to improve COVID-19 diagnosis and treatment in Cochabamba, Bolivia
Autor:
Elizabeth Posada, Nilce Mendoza, Cristina Alonso-Vega, Claire Billot, Beatriz Mallén Muñoz, Leonardo de la Torre, Adalid Paiva, Luis Villarroel, Regina Rabinovich
Publikováno v:
Global Health Action, Vol 17, Iss 1 (2024)
Background Community engagement is recognized as a vital component of health-related research and programs, particularly during infectious disease outbreaks and epidemics. Despite the importance of engaging communities in the response to COVID-19, re
Externí odkaz:
https://doaj.org/article/3e2fe8f6dab045cd9796f65ea5a504c7
Publikováno v:
Autophagy Reports, Vol 3, Iss 1 (2024)
The conjugation of ATG8 (autophagy-related 8) proteins to the lipid phosphatidylethanolamine (PE) is the result of the coordinated and highly regulated action of several ATG core proteins, including ATG4 proteases and the E1 (ATG7)- and E2 (ATG3)- ac
Externí odkaz:
https://doaj.org/article/53b6ce16bfa240ab8b36b8a6743f2b50
Autor:
Mallen, Alex, Asai, Akari, Zhong, Victor, Das, Rajarshi, Khashabi, Daniel, Hajishirzi, Hannaneh
Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge. This pa
Externí odkaz:
http://arxiv.org/abs/2212.10511