Výsledky vyhledávání

Report

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

Autor: Khandelwal, Apoorv, Yun, Tian, Nayak, Nihal V., Merullo, Jack, Bach, Stephen H., Sun, Chen, Pavlick, Ellie

Pre-training is notoriously compute-intensive and academic researchers are notoriously under-resourced. It is, therefore, commonly assumed that academics can't pre-train models. In this paper, we seek to clarify this assumption. We first survey acade

Externí odkaz: http://arxiv.org/abs/2410.23261

Zobrazit plný text záznamu

Report

Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers

Autor: Enyan, Zhang, Wang, Zewei, Lepori, Michael A., Pavlick, Ellie, Aparicio, Helena

Distributional semantics is the linguistic theory that a word's meaning can be derived from its distribution in natural language (i.e., its use). Language models are commonly viewed as an implementation of distributional semantics, as they are optimi

Externí odkaz: http://arxiv.org/abs/2410.13984

Zobrazit plný text záznamu

Report

The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling

Autor: Zhang, Ruochen, Yu, Qinan, Zang, Matianyu, Eickhoff, Carsten, Pavlick, Ellie

We employ new tools from mechanistic interpretability in order to ask whether the internal structure of large language models (LLMs) shows correspondence to the linguistic structures which underlie the languages on which they are trained. In particul

Externí odkaz: http://arxiv.org/abs/2410.09223

Zobrazit plný text záznamu

Report

Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects

Autor: Lepori, Michael A., Tartaglini, Alexa R., Vong, Wai Keen, Serre, Thomas, Lake, Brenden M., Pavlick, Ellie

Though vision transformers (ViTs) have achieved state-of-the-art performance in a variety of settings, they exhibit surprising failures when performing tasks involving visual relations. This begs the question: how do ViTs attempt to perform tasks tha

Externí odkaz: http://arxiv.org/abs/2406.15955

Zobrazit plný text záznamu

Report

Semantic Structure-Mapping in LLM and Human Analogical Reasoning

Autor: Musker, Sam, Duchnowski, Alex, Millière, Raphaël, Pavlick, Ellie

Analogical reasoning is considered core to human learning and cognition. Recent studies have compared the analogical reasoning abilities of human subjects and Large Language Models (LLMs) on abstract symbol manipulation tasks, such as letter string a

Externí odkaz: http://arxiv.org/abs/2406.13803

Zobrazit plný text záznamu

Report

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Autor: Merullo, Jack, Eickhoff, Carsten, Pavlick, Ellie

Although it is known that transformer language models (LMs) pass features from early layers to later layers, it is not well understood how this information is represented and routed by the model. We analyze a mechanism used in two LMs to selectively

Externí odkaz: http://arxiv.org/abs/2406.09519

Zobrazit plný text záznamu

Report

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

Autor: Anand, Suraj, Lepori, Michael A., Merullo, Jack, Pavlick, Ellie

Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context. This contrasts with in-weights learning, where information is statically encoded in model parameters from iterated

Externí odkaz: http://arxiv.org/abs/2406.00053

Zobrazit plný text záznamu

Report

Lessons from the Trenches on Reproducible Evaluation of Language Models

Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of rep

Externí odkaz: http://arxiv.org/abs/2405.14782

Zobrazit plný text záznamu

Report

mOthello: When Do Cross-Lingual Representation Alignment and Cross-Lingual Transfer Emerge in Multilingual Models?

Autor: Hua, Tianze, Yun, Tian, Pavlick, Ellie

Many pretrained multilingual models exhibit cross-lingual transfer ability, which is often attributed to a learned language-neutral representation during pretraining. However, it remains unclear what factors contribute to the learning of a language-n

Externí odkaz: http://arxiv.org/abs/2404.12444

Zobrazit plný text záznamu

Report

Bayesian Preference Elicitation with Language Models

Autor: Handa, Kunal, Gal, Yarin, Pavlick, Ellie, Goodman, Noah, Andreas, Jacob, Tamkin, Alex, Li, Belinda Z.

Aligning AI systems to users' interests requires understanding and incorporating humans' complex values and preferences. Recently, language models (LMs) have been used to gather information about the preferences of human users. This preference data c

Externí odkaz: http://arxiv.org/abs/2403.05534

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání