Výsledky vyhledávání

Report

Realistic Evaluation of Model Merging for Compositional Generalization

Autor: Tam, Derek, Kant, Yash, Lester, Brian, Gilitschenski, Igor, Raffel, Colin

Merging has become a widespread way to cheaply combine individual models into a single model that inherits their capabilities and attains better performance. This popularity has spurred rapid development of many new merging methods, which are typical

Externí odkaz: http://arxiv.org/abs/2409.18314

Zobrazit plný text záznamu

Report

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning

Autor: Yadav, Prateek, Raffel, Colin, Muqeeth, Mohammed, Caccia, Lucas, Liu, Haokun, Chen, Tianlong, Bansal, Mohit, Choshen, Leshem, Sordoni, Alessandro

The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task. Model MoErging methods aim to recycle expert models to create an aggregate system with impro

Externí odkaz: http://arxiv.org/abs/2408.07057

Zobrazit plný text záznamu

Report

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Autor: Penedo, Guilherme, Kydlíček, Hynek, allal, Loubna Ben, Lozhkov, Anton, Mitchell, Margaret, Raffel, Colin, Von Werra, Leandro, Wolf, Thomas

The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-of-the-art open LLMs like Llama 3 and Mixtral are not publicly available and very little i

Externí odkaz: http://arxiv.org/abs/2406.17557

Zobrazit plný text záznamu

Report

Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation

Autor: Raffel, Matthew, Agostinelli, Victor, Chen, Lizhong

Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks, motivating their adoption in simultaneous translation. Current fine-tuning methods to adapt LLMs for simultaneous translation focus on promp

Externí odkaz: http://arxiv.org/abs/2405.10443

Zobrazit plný text záznamu

Report

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Autor: Pan, Bowen, Shen, Yikang, Liu, Haokun, Mishra, Mayank, Zhang, Gaoyuan, Oliva, Aude, Raffel, Colin, Panda, Rameswar

Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios. However, MoE models generally require 2-4$\t

Externí odkaz: http://arxiv.org/abs/2404.05567

Zobrazit plný text záznamu

Report

A Survey on Data Selection for Language Models

Autor: Albalak, Alon, Elazar, Yanai, Xie, Sang Michael, Longpre, Shayne, Lambert, Nathan, Wang, Xinyi, Muennighoff, Niklas, Hou, Bairu, Pan, Liangming, Jeong, Haewon, Raffel, Colin, Chang, Shiyu, Hashimoto, Tatsunori, Wang, William Yang

A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the qualit

Externí odkaz: http://arxiv.org/abs/2402.16827

Zobrazit plný text záznamu

Report

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Autor: Patel, Ajay, Raffel, Colin, Callison-Burch, Chris

Large language models (LLMs) have become a dominant and important tool for NLP researchers in a wide range of tasks. Today, many researchers use LLMs in synthetic data generation, task evaluation, fine-tuning, distillation, and other model-in-the-loo

Externí odkaz: http://arxiv.org/abs/2402.10379

Zobrazit plný text záznamu

Report

Learning to Route Among Specialized Experts for Zero-Shot Generalization

Autor: Muqeeth, Mohammed, Liu, Haokun, Liu, Yufan, Raffel, Colin

Recently, there has been a widespread proliferation of "expert" language models that are specialized to a specific task or domain through parameter-efficient fine-tuning. How can we recycle large collections of expert language models to improve zero-

Externí odkaz: http://arxiv.org/abs/2402.05859

Zobrazit plný text záznamu

Report

Distributed Inference and Fine-tuning of Large Language Models Over The Internet

Autor: Borzunov, Alexander, Ryabinin, Max, Chumachenko, Artem, Baranchuk, Dmitry, Dettmers, Tim, Belkada, Younes, Samygin, Pavel, Raffel, Colin

Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to

Externí odkaz: http://arxiv.org/abs/2312.08361

Zobrazit plný text záznamu

Report

Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models

Autor: Agostinelli, Victor, Wild, Max, Raffel, Matthew, Fuad, Kazi Ahmed Asif, Chen, Lizhong

Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks. Neural machine transl

Externí odkaz: http://arxiv.org/abs/2312.04691

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání