Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Chanin, David"'
Sparse Autoencoders (SAEs) have emerged as a promising approach to decompose the activations of Large Language Models (LLMs) into human-interpretable latents. In this paper, we pose two questions. First, to what extent do SAEs extract monosemantic an
Externí odkaz:
http://arxiv.org/abs/2409.14507
Autor:
Tan, Daniel, Chanin, David, Lynch, Aengus, Kanoulas, Dimitrios, Paige, Brooks, Garriga-Alonso, Adria, Kirk, Robert
Steering vectors (SVs) are a new approach to efficiently adjust language model behaviour at inference time by intervening on intermediate model activations. They have shown promise in terms of improving both capabilities and model alignment. However,
Externí odkaz:
http://arxiv.org/abs/2407.12404
Transformer language models (LMs) have been shown to represent concepts as directions in the latent space of hidden activations. However, for any human-interpretable concept, how can we find its direction in the latent space? We present a technique c
Externí odkaz:
http://arxiv.org/abs/2311.08968
Autor:
Chanin, David
While the state-of-the-art for frame semantic parsing has progressed dramatically in recent years, it is still difficult for end-users to apply state-of-the-art models in practice. To address this, we present Frame Semantic Transformer, an open-sourc
Externí odkaz:
http://arxiv.org/abs/2303.12788
Autor:
Chanin, David, Hunter, Anthony
Social norms underlie all human social interactions, yet formalizing and reasoning with them remains a major challenge for AI systems. We present a novel system for taking social rules of thumb (ROTs) in natural language from the Social Chemistry 101
Externí odkaz:
http://arxiv.org/abs/2303.08264