Zobrazeno 1 - 10
of 4 940
pro vyhledávání: '"P. Arnett"'
Open-source large language models are becoming increasingly available and popular among researchers and practitioners. While significant progress has been made on open-weight models, open training data is a practice yet to be adopted by the leading o
Externí odkaz:
http://arxiv.org/abs/2410.22587
Language models can largely benefit from efficient tokenization. However, they still mostly utilize the classical BPE algorithm, a simple and reliable method. This has been shown to cause such issues as under-trained tokens and sub-optimal compressio
Externí odkaz:
http://arxiv.org/abs/2409.04599
For many low-resource languages, the only available language models are large multilingual models trained on many languages simultaneously. However, using FLORES perplexity as a metric, we find that these models perform worse than bigrams for many la
Externí odkaz:
http://arxiv.org/abs/2408.10441
Autor:
Rizzuti, Federico, Hirschi, Raphael, Varma, Vishnu, Arnett, William David, Georgy, Cyril, Meakin, Casey, Mocák, Miroslav, Murphy, Alexander St. John, Rauscher, Thomas
One-dimensional (1D) stellar evolution models are widely used across various astrophysical fields, however they are still dominated by important uncertainties that deeply affect their predictive power. Among those, the merging of independent convecti
Externí odkaz:
http://arxiv.org/abs/2407.15544
Autor:
Georgy, C., Rizzuti, F., Hirschi, R., Varma, V., Arnett, W. D., Meakin, C., Mocak, M., Murphy, A. StJ., Rauscher, T.
The treatment of convection remains a major weakness in the modelling of stellar evolution with one-dimensional (1D) codes. The ever increasing computing power makes now possible to simulate in 3D part of a star for a fraction of its life, allowing u
Externí odkaz:
http://arxiv.org/abs/2405.21033
Transformers have generally supplanted recurrent neural networks as the dominant architecture for both natural language processing tasks and for modelling the effect of predictability on online human language comprehension. However, two recently deve
Externí odkaz:
http://arxiv.org/abs/2404.19178
The relationship between language model tokenization and performance is an open area of research. Here, we investigate how different tokenization schemes impact number agreement in Spanish plurals. We find that morphologically-aligned tokenization pe
Externí odkaz:
http://arxiv.org/abs/2403.13754
How should text dataset sizes be compared across languages? Even for content-matched (parallel) corpora, UTF-8 encoded text can require a dramatically different number of bytes for different languages. In our work, we define the byte premium between
Externí odkaz:
http://arxiv.org/abs/2403.00686
Multilingual language models are widely used to extend NLP systems to low-resource languages. However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce. Here, we pre-train ov
Externí odkaz:
http://arxiv.org/abs/2311.09205
Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models
Abstract grammatical knowledge - of parts of speech and grammatical patterns - is key to the capacity for linguistic generalization in humans. But how abstract is grammatical knowledge in large language models? In the human literature, compelling evi
Externí odkaz:
http://arxiv.org/abs/2311.09194