Zobrazeno 1 - 10
of 308
pro vyhledávání: '"Merrill, William A."'
How novel are texts generated by language models (LMs) relative to their training corpora? In this work, we investigate the extent to which modern LMs generate $n$-grams from their training data, evaluating both (i) the probability LMs assign to comp
Externí odkaz:
http://arxiv.org/abs/2406.13069
Chain-of-thought responses from language models improve performance across most benchmarks. However, it remains unclear to what extent these performance gains can be attributed to human-like task decomposition or simply the greater computation that a
Externí odkaz:
http://arxiv.org/abs/2404.15758
State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot expr
Externí odkaz:
http://arxiv.org/abs/2404.08819
Do LMs infer the semantics of text from co-occurrence patterns in their training data? Merrill et al. (2022) argue that, in theory, sentence co-occurrence probabilities predicted by an optimal LM should reflect the entailment relationship of the cons
Externí odkaz:
http://arxiv.org/abs/2402.13956
Autor:
Groeneveld, Dirk, Beltagy, Iz, Walsh, Pete, Bhagia, Akshita, Kinney, Rodney, Tafjord, Oyvind, Jha, Ananya Harsh, Ivison, Hamish, Magnusson, Ian, Wang, Yizhong, Arora, Shane, Atkinson, David, Authur, Russell, Chandu, Khyathi Raghavi, Cohan, Arman, Dumas, Jennifer, Elazar, Yanai, Gu, Yuling, Hessel, Jack, Khot, Tushar, Merrill, William, Morrison, Jacob, Muennighoff, Niklas, Naik, Aakanksha, Nam, Crystal, Peters, Matthew E., Pyatkin, Valentina, Ravichander, Abhilasha, Schwenk, Dustin, Shah, Saurabh, Smith, Will, Strubell, Emma, Subramani, Nishant, Wortsman, Mitchell, Dasigi, Pradeep, Lambert, Nathan, Richardson, Kyle, Zettlemoyer, Luke, Dodge, Jesse, Lo, Kyle, Soldaini, Luca, Smith, Noah A., Hajishirzi, Hannaneh
Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important det
Externí odkaz:
http://arxiv.org/abs/2402.00838
Publikováno v:
Transactions of the Association for Computational Linguistics, 12:543-561, 2024
As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring such questions can help clarify the p
Externí odkaz:
http://arxiv.org/abs/2311.00208
Autor:
Merrill, William, Sabharwal, Ashish
Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately a
Externí odkaz:
http://arxiv.org/abs/2310.07923
A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements. Hallucinations are often attributed to knowledge gaps in LMs, but we hypothesize that in some cases, when justifying previously gen
Externí odkaz:
http://arxiv.org/abs/2305.13534
Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly. We empirically study the internal structure of networks
Externí odkaz:
http://arxiv.org/abs/2303.11873
Many current NLP systems are built from language models trained to optimize unsupervised objectives on large amounts of raw text. Under what conditions might such a procedure acquire meaning? Our systematic experiments with synthetic data reveal that
Externí odkaz:
http://arxiv.org/abs/2210.07468