Zobrazeno 1 - 10
of 477
pro vyhledávání: '"Warstadt"'
Autor:
Hu, Michael Y., Mueller, Aaron, Ross, Candace, Williams, Adina, Linzen, Tal, Zhuang, Chengxu, Cotterell, Ryan, Choshen, Leshem, Warstadt, Alex, Wilcox, Ethan Gotlieb
The BabyLM Challenge is a community effort to close the data-efficiency gap between human and computational language learners. Participants compete to optimize language model training on a fixed language data budget of 100 million words or less. This
Externí odkaz:
http://arxiv.org/abs/2412.05149
Autor:
Tsipidi, Eleftheria, Nowak, Franz, Cotterell, Ryan, Wilcox, Ethan, Giulianelli, Mario, Warstadt, Alex
The Uniform Information Density (UID) hypothesis posits that speakers tend to distribute information evenly across linguistic units to achieve efficient communication. Of course, information rate in texts and discourses is not perfectly uniform. Whil
Externí odkaz:
http://arxiv.org/abs/2410.16062
Humans appear to have a critical period (CP) for language acquisition: Second language (L2) acquisition becomes harder after early childhood, and ceasing exposure to a first language (L1) after this period (but not before) typically does not lead to
Externí odkaz:
http://arxiv.org/abs/2407.19325
Autor:
Warstadt, Alex, Parrish, Alicia, Liu, Haokun, Mohananey, Anhad, Peng, Wei, Wang, Sheng-Fu, Bowman, Samuel R.
Publikováno v:
Transactions of the Association for Computational Linguistics, Vol 8, Pp 377-392 (2020)
We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP), 1
Externí odkaz:
https://doaj.org/article/790c378bb37a4835b5207578e84e0795
Publikováno v:
Transactions of the Association for Computational Linguistics, Vol 7, Pp 625-641 (2019)
This paper investigates the ability of artificial neural networks to judge the grammatical acceptability of a sentence, with the goal of testing their linguistic competence. We introduce the Corpus of Linguistic Acceptability (CoLA), a set of 10,657
Externí odkaz:
https://doaj.org/article/63af6d0d536b461091e6c964ad4cfee5
Autor:
Choshen, Leshem, Cotterell, Ryan, Hu, Michael Y., Linzen, Tal, Mueller, Aaron, Ross, Candace, Warstadt, Alex, Wilcox, Ethan, Williams, Adina, Zhuang, Chengxu
After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's compet
Externí odkaz:
http://arxiv.org/abs/2404.06214
Publikováno v:
LREC-Coling 2024, May 2024, Turin, Italy
The acquisition of grammar has been a central question to adjudicate between theories of language acquisition. In order to conduct faster, more reproducible, and larger-scale corpus studies on grammaticality in child-caregiver conversations, tools fo
Externí odkaz:
http://arxiv.org/abs/2403.14208
Autor:
Amariucai, Theodor, Warstadt, Alex
In contrast to children, language models (LMs) exhibit considerably inferior data efficiency when acquiring language. In this submission to the BabyLM Challenge (Warstadt et al., 2023), we test the hypothesis that this data efficiency gap is partly c
Externí odkaz:
http://arxiv.org/abs/2402.17936
Autor:
Wolf, Lukas, Tuckute, Greta, Kotar, Klemen, Hosseini, Eghbal, Regev, Tamar, Wilcox, Ethan, Warstadt, Alex
Training on multiple modalities of input can augment the capabilities of a language model. Here, we ask whether such a training regime can improve the quality and efficiency of these systems as well. We focus on text--audio and introduce Whisbert, wh
Externí odkaz:
http://arxiv.org/abs/2312.02931
The linear subspace hypothesis (Bolukbasi et al., 2016) states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace. Prior work has relied on auxiliary classification
Externí odkaz:
http://arxiv.org/abs/2307.15054