Zobrazeno 1 - 10
of 3 091
pro vyhledávání: '"A Pavlick"'
Autor:
Khandelwal, Apoorv, Yun, Tian, Nayak, Nihal V., Merullo, Jack, Bach, Stephen H., Sun, Chen, Pavlick, Ellie
Pre-training is notoriously compute-intensive and academic researchers are notoriously under-resourced. It is, therefore, commonly assumed that academics can't pre-train models. In this paper, we seek to clarify this assumption. We first survey acade
Externí odkaz:
http://arxiv.org/abs/2410.23261
Distributional semantics is the linguistic theory that a word's meaning can be derived from its distribution in natural language (i.e., its use). Language models are commonly viewed as an implementation of distributional semantics, as they are optimi
Externí odkaz:
http://arxiv.org/abs/2410.13984
We employ new tools from mechanistic interpretability in order to ask whether the internal structure of large language models (LLMs) shows correspondence to the linguistic structures which underlie the languages on which they are trained. In particul
Externí odkaz:
http://arxiv.org/abs/2410.09223
Autor:
Lepori, Michael A., Tartaglini, Alexa R., Vong, Wai Keen, Serre, Thomas, Lake, Brenden M., Pavlick, Ellie
Though vision transformers (ViTs) have achieved state-of-the-art performance in a variety of settings, they exhibit surprising failures when performing tasks involving visual relations. This begs the question: how do ViTs attempt to perform tasks tha
Externí odkaz:
http://arxiv.org/abs/2406.15955
Analogical reasoning is considered core to human learning and cognition. Recent studies have compared the analogical reasoning abilities of human subjects and Large Language Models (LLMs) on abstract symbol manipulation tasks, such as letter string a
Externí odkaz:
http://arxiv.org/abs/2406.13803
Although it is known that transformer language models (LMs) pass features from early layers to later layers, it is not well understood how this information is represented and routed by the model. We analyze a mechanism used in two LMs to selectively
Externí odkaz:
http://arxiv.org/abs/2406.09519
Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context. This contrasts with in-weights learning, where information is statically encoded in model parameters from iterated
Externí odkaz:
http://arxiv.org/abs/2406.00053
Autor:
Biderman, Stella, Schoelkopf, Hailey, Sutawika, Lintang, Gao, Leo, Tow, Jonathan, Abbasi, Baber, Aji, Alham Fikri, Ammanamanchi, Pawan Sasanka, Black, Sidney, Clive, Jordan, DiPofi, Anthony, Etxaniz, Julen, Fattori, Benjamin, Forde, Jessica Zosa, Foster, Charles, Hsu, Jeffrey, Jaiswal, Mimansa, Lee, Wilson Y., Li, Haonan, Lovering, Charles, Muennighoff, Niklas, Pavlick, Ellie, Phang, Jason, Skowron, Aviya, Tan, Samson, Tang, Xiangru, Wang, Kevin A., Winata, Genta Indra, Yvon, François, Zou, Andy
Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of rep
Externí odkaz:
http://arxiv.org/abs/2405.14782
Many pretrained multilingual models exhibit cross-lingual transfer ability, which is often attributed to a learned language-neutral representation during pretraining. However, it remains unclear what factors contribute to the learning of a language-n
Externí odkaz:
http://arxiv.org/abs/2404.12444
Autor:
Handa, Kunal, Gal, Yarin, Pavlick, Ellie, Goodman, Noah, Andreas, Jacob, Tamkin, Alex, Li, Belinda Z.
Aligning AI systems to users' interests requires understanding and incorporating humans' complex values and preferences. Recently, language models (LMs) have been used to gather information about the preferences of human users. This preference data c
Externí odkaz:
http://arxiv.org/abs/2403.05534