Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Marion, Max"'
Autor:
Ankner, Zachary, Blakeney, Cody, Sreenivasan, Kartik, Marion, Max, Leavitt, Matthew L., Paul, Mansheej
In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a
Externí odkaz:
http://arxiv.org/abs/2405.20541
Large volumes of text data have contributed significantly to the development of large language models (LLMs) in recent years. This data is typically acquired by scraping the internet, leading to pretraining datasets comprised of noisy web text. To da
Externí odkaz:
http://arxiv.org/abs/2309.04564