Zobrazeno 1 - 10
of 3 584
pro vyhledávání: '"Cotterell, A."'
Autor:
Hu, Michael Y., Mueller, Aaron, Ross, Candace, Williams, Adina, Linzen, Tal, Zhuang, Chengxu, Cotterell, Ryan, Choshen, Leshem, Warstadt, Alex, Wilcox, Ethan Gotlieb
The BabyLM Challenge is a community effort to close the data-efficiency gap between human and computational language learners. Participants compete to optimize language model training on a fixed language data budget of 100 million words or less. This
Externí odkaz:
http://arxiv.org/abs/2412.05149
Autor:
Vieira, Tim, LeBrun, Ben, Giulianelli, Mario, Gastaldi, Juan Luis, DuSell, Brian, Terilla, John, O'Donnell, Timothy J., Cotterell, Ryan
Modern language models are internally -- and mathematically -- distributions over token strings rather than \emph{character} strings, posing numerous challenges for programmers building user applications on top of them. For example, if a prompt is sp
Externí odkaz:
http://arxiv.org/abs/2412.03719
Recent work finds that retrieval-augmented generation with large language models is prone to be influenced by the order of retrieved documents in the context. However, the lack of in-depth analysis limits the use of this phenomenon for prompt enginee
Externí odkaz:
http://arxiv.org/abs/2411.07773
Autor:
Minder, Julian, Du, Kevin, Stoehr, Niklas, Monea, Giovanni, Wendler, Chris, West, Robert, Cotterell, Ryan
When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge. Choosing how sensitive the model is to its context is a fundamental functionality, as it enables the model to excel at tasks like retr
Externí odkaz:
http://arxiv.org/abs/2411.07404
Understanding and manipulating the causal generation mechanisms in language models is essential for controlling their behavior. Previous work has primarily relied on techniques such as representation surgery -- e.g., model ablations or manipulation o
Externí odkaz:
http://arxiv.org/abs/2411.07180
Autor:
Butoi, Alexandra, Khalighinejad, Ghazal, Svete, Anej, Valvoda, Josef, Cotterell, Ryan, DuSell, Brian
Characterizing the computational power of neural network architectures in terms of formal language theory remains a crucial line of research, as it describes lower and upper bounds on the reasoning capabilities of modern AI. However, when empirically
Externí odkaz:
http://arxiv.org/abs/2411.07107
Extracting finite state automata (FSAs) from black-box models offers a powerful approach to gaining interpretable insights into complex model behaviors. To support this pursuit, we present a weighted variant of Angluin's (1987) $\mathbf{L^*}$ algorit
Externí odkaz:
http://arxiv.org/abs/2411.06228
Autor:
Tsipidi, Eleftheria, Nowak, Franz, Cotterell, Ryan, Wilcox, Ethan, Giulianelli, Mario, Warstadt, Alex
The Uniform Information Density (UID) hypothesis posits that speakers tend to distribute information evenly across linguistic units to achieve efficient communication. Of course, information rate in texts and discourses is not perfectly uniform. Whil
Externí odkaz:
http://arxiv.org/abs/2410.16062
One strength of modern language models is their ability to incorporate information from a user-input context when answering queries. However, they are not equally sensitive to the subtle changes to that context. To quantify this, Du et al. (2024) giv
Externí odkaz:
http://arxiv.org/abs/2410.14361
Numerous previous studies have sought to determine to what extent language models, pretrained on natural language text, can serve as useful models of human cognition. In this paper, we are interested in the opposite question: whether we can directly
Externí odkaz:
http://arxiv.org/abs/2410.13086