Zobrazeno 1 - 10
of 22
pro vyhledávání: '"Svete, Anej"'
Understanding and manipulating the causal generation mechanisms in language models is essential for controlling their behavior. Previous work has primarily relied on techniques such as representation surgery -- e.g., model ablations or manipulation o
Externí odkaz:
http://arxiv.org/abs/2411.07180
Autor:
Butoi, Alexandra, Khalighinejad, Ghazal, Svete, Anej, Valvoda, Josef, Cotterell, Ryan, DuSell, Brian
Characterizing the computational power of neural network architectures in terms of formal language theory remains a crucial line of research, as it describes lower and upper bounds on the reasoning capabilities of modern AI. However, when empirically
Externí odkaz:
http://arxiv.org/abs/2411.07107
Extracting finite state automata (FSAs) from black-box models offers a powerful approach to gaining interpretable insights into complex model behaviors. To support this pursuit, we present a weighted variant of Angluin's (1987) $\mathbf{L^*}$ algorit
Externí odkaz:
http://arxiv.org/abs/2411.06228
Much theoretical work has described the ability of transformers to represent formal languages. However, linking theoretical results to empirical performance is not straightforward due to the complex interplay between the architecture, the learning al
Externí odkaz:
http://arxiv.org/abs/2410.03001
The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is th
Externí odkaz:
http://arxiv.org/abs/2406.14197
Autor:
Tan, Naaman, Valvoda, Josef, Liu, Tianyu, Svete, Anej, Qin, Yanxia, Min-Yen, Kan, Cotterell, Ryan
The relationship between the quality of a string, as judged by a human reader, and its probability, $p(\boldsymbol{y})$ under a language model undergirds the development of better language models. For example, many popular algorithms for sampling fro
Externí odkaz:
http://arxiv.org/abs/2406.10203
Autor:
Borenstein, Nadav, Svete, Anej, Chan, Robin, Valvoda, Josef, Nowak, Franz, Augenstein, Isabelle, Chodroff, Eleanor, Cotterell, Ryan
What can large language models learn? By definition, language models (LM) are distributions over strings. Therefore, an intuitive way of addressing the above question is to formalize it as a matter of learnability of classes of distributions over str
Externí odkaz:
http://arxiv.org/abs/2406.04289
Autor:
Chan, Robin SM, Boumasmoud, Reda, Svete, Anej, Ren, Yuxin, Guo, Qipeng, Jin, Zhijing, Ravfogel, Shauli, Sachan, Mrinmaya, Schölkopf, Bernhard, El-Assady, Mennatallah, Cotterell, Ryan
Pre-trained language encoders -- functions that represent text as vectors -- are an integral component of many NLP tasks. We tackle a natural question in language encoder analysis: What does it mean for two encoders to be similar? We contend that a f
Externí odkaz:
http://arxiv.org/abs/2406.02329
The recent successes and spread of large neural language models (LMs) call for a thorough understanding of their computational ability. Describing their computational abilities through LMs' \emph{representational capacity} is a lively area of researc
Externí odkaz:
http://arxiv.org/abs/2405.19222
Autor:
Svete, Anej, Cotterell, Ryan
Existing work has analyzed the representational capacity of the transformer architecture by means of formal models of computation. However, the focus so far has been on analyzing the architecture in terms of language \emph{acceptance}. We contend tha
Externí odkaz:
http://arxiv.org/abs/2404.14994