Výsledky vyhledávání

Report

Counterfactual Generation from Language Models

Autor: Ravfogel, Shauli, Svete, Anej, Snæbjarnarson, Vésteinn, Cotterell, Ryan

Understanding and manipulating the causal generation mechanisms in language models is essential for controlling their behavior. Previous work has primarily relied on techniques such as representation surgery -- e.g., model ablations or manipulation o

Externí odkaz: http://arxiv.org/abs/2411.07180

Zobrazit plný text záznamu

Report

Training Neural Networks as Recognizers of Formal Languages

Autor: Butoi, Alexandra, Khalighinejad, Ghazal, Svete, Anej, Valvoda, Josef, Cotterell, Ryan, DuSell, Brian

Characterizing the computational power of neural network architectures in terms of formal language theory remains a crucial line of research, as it describes lower and upper bounds on the reasoning capabilities of modern AI. However, when empirically

Externí odkaz: http://arxiv.org/abs/2411.07107

Zobrazit plný text záznamu

Report

An $\mathbf{L^*}$ Algorithm for Deterministic Weighted Regular Languages

Autor: Pasti, Clemente, Karagöz, Talu, Svete, Anej, Nowak, Franz, Boumasmoud, Reda, Cotterell, Ryan

Extracting finite state automata (FSAs) from black-box models offers a powerful approach to gaining interpretable insights into complex model behaviors. To support this pursuit, we present a weighted variant of Angluin's (1987) $\mathbf{L^*}$ algorit

Externí odkaz: http://arxiv.org/abs/2411.06228

Zobrazit plný text záznamu

Report

Can Transformers Learn $n$-gram Language Models?

Autor: Svete, Anej, Borenstein, Nadav, Zhou, Mike, Augenstein, Isabelle, Cotterell, Ryan

Much theoretical work has described the ability of transformers to represent formal languages. However, linking theoretical results to empirical performance is not straightforward due to the complex interplay between the architecture, the learning al

Externí odkaz: http://arxiv.org/abs/2410.03001

Zobrazit plný text záznamu

Report

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

Autor: Nowak, Franz, Svete, Anej, Butoi, Alexandra, Cotterell, Ryan

The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is th

Externí odkaz: http://arxiv.org/abs/2406.14197

Zobrazit plný text záznamu

Report

A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Autor: Tan, Naaman, Valvoda, Josef, Liu, Tianyu, Svete, Anej, Qin, Yanxia, Min-Yen, Kan, Cotterell, Ryan

The relationship between the quality of a string, as judged by a human reader, and its probability, $p(\boldsymbol{y})$ under a language model undergirds the development of better language models. For example, many popular algorithms for sampling fro

Externí odkaz: http://arxiv.org/abs/2406.10203

Zobrazit plný text záznamu

Report

What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

Autor: Borenstein, Nadav, Svete, Anej, Chan, Robin, Valvoda, Josef, Nowak, Franz, Augenstein, Isabelle, Chodroff, Eleanor, Cotterell, Ryan

What can large language models learn? By definition, language models (LM) are distributions over strings. Therefore, an intuitive way of addressing the above question is to formalize it as a matter of learnability of classes of distributions over str

Externí odkaz: http://arxiv.org/abs/2406.04289

Zobrazit plný text záznamu

Report

On Affine Homotopy between Language Encoders

Autor: Chan, Robin SM, Boumasmoud, Reda, Svete, Anej, Ren, Yuxin, Guo, Qipeng, Jin, Zhijing, Ravfogel, Shauli, Sachan, Mrinmaya, Schölkopf, Bernhard, El-Assady, Mennatallah, Cotterell, Ryan

Pre-trained language encoders -- functions that represent text as vectors -- are an integral component of many NLP tasks. We tackle a natural question in language encoder analysis: What does it mean for two encoders to be similar? We contend that a f

Externí odkaz: http://arxiv.org/abs/2406.02329

Zobrazit plný text záznamu

Report

Lower Bounds on the Expressivity of Recurrent Neural Language Models

Autor: Svete, Anej, Nowak, Franz, Sahabdeen, Anisha Mohamed, Cotterell, Ryan

The recent successes and spread of large neural language models (LMs) call for a thorough understanding of their computational ability. Describing their computational abilities through LMs' \emph{representational capacity} is a lively area of researc

Externí odkaz: http://arxiv.org/abs/2405.19222

Zobrazit plný text záznamu

Report

Transformers Can Represent $n$-gram Language Models

Autor: Svete, Anej, Cotterell, Ryan

Existing work has analyzed the representational capacity of the transformer architecture by means of formal models of computation. However, the focus so far has been on analyzing the architecture in terms of language \emph{acceptance}. We contend tha

Externí odkaz: http://arxiv.org/abs/2404.14994

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání