Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Ayonrinde, Kola"'
Autor:
Ayonrinde, Kola
Sparse autoencoders (SAEs) are a promising approach to extracting features from neural networks, enabling model interpretability as well as causal interventions on model internals. SAEs generate sparse feature representations using a sparsifying acti
Externí odkaz:
http://arxiv.org/abs/2411.02124
Sparse Autoencoders (SAEs) have emerged as a useful tool for interpreting the internal representations of neural networks. However, naively optimising SAEs for reconstruction loss and sparsity results in a preference for SAEs that are extremely wide
Externí odkaz:
http://arxiv.org/abs/2410.11179