Zobrazeno 1 - 10
of 6 291
pro vyhledávání: '"Raffel, A"'
The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, whic
Externí odkaz:
http://arxiv.org/abs/2411.15102
Merging has become a widespread way to cheaply combine individual models into a single model that inherits their capabilities and attains better performance. This popularity has spurred rapid development of many new merging methods, which are typical
Externí odkaz:
http://arxiv.org/abs/2409.18314
Autor:
Yadav, Prateek, Raffel, Colin, Muqeeth, Mohammed, Caccia, Lucas, Liu, Haokun, Chen, Tianlong, Bansal, Mohit, Choshen, Leshem, Sordoni, Alessandro
The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task. Model MoErging methods aim to recycle expert models to create an aggregate system with impro
Externí odkaz:
http://arxiv.org/abs/2408.07057
Autor:
Penedo, Guilherme, Kydlíček, Hynek, allal, Loubna Ben, Lozhkov, Anton, Mitchell, Margaret, Raffel, Colin, Von Werra, Leandro, Wolf, Thomas
The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-of-the-art open LLMs like Llama 3 and Mixtral are not publicly available and very little i
Externí odkaz:
http://arxiv.org/abs/2406.17557
Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks, motivating their adoption in simultaneous translation. Current fine-tuning methods to adapt LLMs for simultaneous translation focus on promp
Externí odkaz:
http://arxiv.org/abs/2405.10443
Autor:
Pan, Bowen, Shen, Yikang, Liu, Haokun, Mishra, Mayank, Zhang, Gaoyuan, Oliva, Aude, Raffel, Colin, Panda, Rameswar
Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios. However, MoE models generally require 2-4$\t
Externí odkaz:
http://arxiv.org/abs/2404.05567
Autor:
Albalak, Alon, Elazar, Yanai, Xie, Sang Michael, Longpre, Shayne, Lambert, Nathan, Wang, Xinyi, Muennighoff, Niklas, Hou, Bairu, Pan, Liangming, Jeong, Haewon, Raffel, Colin, Chang, Shiyu, Hashimoto, Tatsunori, Wang, William Yang
A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the qualit
Externí odkaz:
http://arxiv.org/abs/2402.16827
Large language models (LLMs) have become a dominant and important tool for NLP researchers in a wide range of tasks. Today, many researchers use LLMs in synthetic data generation, task evaluation, fine-tuning, distillation, and other model-in-the-loo
Externí odkaz:
http://arxiv.org/abs/2402.10379
Recently, there has been a widespread proliferation of "expert" language models that are specialized to a specific task or domain through parameter-efficient fine-tuning. How can we recycle large collections of expert language models to improve zero-
Externí odkaz:
http://arxiv.org/abs/2402.05859
Autor:
Borzunov, Alexander, Ryabinin, Max, Chumachenko, Artem, Baranchuk, Dmitry, Dettmers, Tim, Belkada, Younes, Samygin, Pavel, Raffel, Colin
Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to
Externí odkaz:
http://arxiv.org/abs/2312.08361