Zobrazeno 1 - 10
of 41
pro vyhledávání: '"Arora, Aryaman"'
Autor:
Wu, Zhengxuan, Arora, Aryaman, Wang, Zheng, Geiger, Atticus, Jurafsky, Dan, Manning, Christopher D., Potts, Christopher
Parameter-efficient finetuning (PEFT) methods seek to adapt large neural models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editi
Externí odkaz:
http://arxiv.org/abs/2404.03592
Autor:
Wu, Zhengxuan, Geiger, Atticus, Arora, Aryaman, Huang, Jing, Wang, Zheng, Goodman, Noah D., Manning, Christopher D., Potts, Christopher
Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce $\textbf{pyvene}$, an open-source Python library tha
Externí odkaz:
http://arxiv.org/abs/2403.07809
Language models (LMs) have proven to be powerful tools for psycholinguistic research, but most prior work has focused on purely behavioural measures (e.g., surprisal comparisons). At the same time, research in model interpretability has begun to illu
Externí odkaz:
http://arxiv.org/abs/2402.12560
Autor:
San, Nay, Paraskevopoulos, Georgios, Arora, Aryaman, He, Xiluo, Kaur, Prabhjot, Adams, Oliver, Jurafsky, Dan
While massively multilingual speech models like wav2vec 2.0 XLSR-128 can be directly fine-tuned for automatic speech recognition (ASR), downstream performance can still be relatively poor on languages that are under-represented in the pre-training da
Externí odkaz:
http://arxiv.org/abs/2402.02302
Autor:
Wu, Zhengxuan, Geiger, Atticus, Huang, Jing, Arora, Aryaman, Icard, Thomas, Potts, Christopher, Goodman, Noah D.
We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions".
Externí odkaz:
http://arxiv.org/abs/2401.12631
Autor:
Prasanna, Kabilan, Arora, Aryaman
Tamil, a Dravidian language of South Asia, is a highly diglossic language with two very different registers in everyday use: Literary Tamil (preferred in writing and formal communication) and Spoken Tamil (confined to speech and informal media). Spok
Externí odkaz:
http://arxiv.org/abs/2311.07804
Mechanistic interpretability seeks to understand the neural mechanisms that enable specific behaviors in Large Language Models (LLMs) by leveraging causality-based methods. While these approaches have identified neural circuits that copy spans of tex
Externí odkaz:
http://arxiv.org/abs/2308.14179
We introduce Jambu, a cognate database of South Asian languages which unifies dozens of previous sources in a structured and accessible format. The database includes 287k lemmata from 602 lects, grouped together in 23k sets of cognates. We outline th
Externí odkaz:
http://arxiv.org/abs/2306.02514
CGELBank is a treebank and associated tools based on a syntactic formalism for English derived from the Cambridge Grammar of the English Language. This document lays out the particularities of the CGELBank annotation scheme.
Externí odkaz:
http://arxiv.org/abs/2305.17347
Localizing behaviors of neural networks to a subset of the network's components or a subset of interactions between components is a natural first step towards analyzing network mechanisms and possible failure modes. Existing work is often qualitative
Externí odkaz:
http://arxiv.org/abs/2304.05969