Zobrazeno 1 - 10
of 816
pro vyhledávání: '"Barez A"'
Sparse Autoencoders (SAEs) have shown promise in improving the interpretability of neural network activations, but can learn features that are not features of the input, limiting their effectiveness. We propose \textsc{Mutual Feature Regularization}
Externí odkaz:
http://arxiv.org/abs/2411.01220
Preference learning is a central component for aligning current LLMs, but this process can be vulnerable to data poisoning attacks. To address this concern, we introduce PoisonBench, a benchmark for evaluating large language models' susceptibility to
Externí odkaz:
http://arxiv.org/abs/2410.08811
Vision-Language Models (VLMs) are powerful tools for processing and understanding text and images. We study the processing of visual tokens in the language model component of LLaVA, a prominent VLM. Our approach focuses on analyzing the localization
Externí odkaz:
http://arxiv.org/abs/2410.07149
We investigate feature universality in large language models (LLMs), a research field that aims to understand how different models similarly represent concepts in the latent spaces of their intermediate layers. Demonstrating feature universality allo
Externí odkaz:
http://arxiv.org/abs/2410.06981
Publikováno v:
Brain Sciences, Vol 13, Iss 7, p 1081 (2023)
In the original publication [...]
Externí odkaz:
https://doaj.org/article/cfe01e0eb03f4463a2effa1e24b68e6e
Publikováno v:
Brain Sciences, Vol 13, Iss 4, p 544 (2023)
The organizational strategy and environment of the healthcare systems influence the turnover rate among healthcare provider personnel. These critical factors have received scant attention in the literature and particularly in the healthcare systems o
Externí odkaz:
https://doaj.org/article/fb8ca45182574c869be395ef854a82fc
Autor:
Denison, Carson, MacDiarmid, Monte, Barez, Fazl, Duvenaud, David, Kravec, Shauna, Marks, Samuel, Schiefer, Nicholas, Soklaski, Ryan, Tamkin, Alex, Kaplan, Jared, Shlegeris, Buck, Bowman, Samuel R., Perez, Ethan, Hubinger, Evan
In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pe
Externí odkaz:
http://arxiv.org/abs/2406.10162
Publikováno v:
Brain Sciences, Vol 13, Iss 2, p 251 (2023)
Job satisfaction and burnout are components of job morale. In general, and among healthcare provider personnel, these are psychological factors of the job and under the influence of different conditions and the organizational management of the health
Externí odkaz:
https://doaj.org/article/54b5557f61a744ce9296da3f264d593a
Autor:
Eiras, Francisco, Petrov, Aleksandar, Vidgen, Bertie, Schroeder, Christian, Pizzati, Fabio, Elkins, Katherine, Mukhopadhyay, Supratik, Bibi, Adel, Purewal, Aaron, Botos, Csaba, Steibel, Fabro, Keshtkar, Fazel, Barez, Fazl, Smith, Genevieve, Guadagni, Gianluca, Chun, Jon, Cabot, Jordi, Imperial, Joseph, Nolazco, Juan Arturo, Landay, Lori, Jackson, Matthew, Torr, Phillip H. S., Darrell, Trevor, Lee, Yong, Foerster, Jakob
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the tec
Externí odkaz:
http://arxiv.org/abs/2405.08597
In certain situations, neural networks will represent environment states in their hidden activations. Our goal is to visualize what environment states the networks are representing. We experiment with a recurrent neural network (RNN) architecture wit
Externí odkaz:
http://arxiv.org/abs/2405.06409