Výsledky vyhledávání

Report

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

Autor: Liu, Fengyuan, Kandpal, Nikhil, Raffel, Colin

The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, whic

Externí odkaz: http://arxiv.org/abs/2411.15102

Zobrazit plný text záznamu

Report

Realistic Evaluation of Model Merging for Compositional Generalization

Autor: Tam, Derek, Kant, Yash, Lester, Brian, Gilitschenski, Igor, Raffel, Colin

Merging has become a widespread way to cheaply combine individual models into a single model that inherits their capabilities and attains better performance. This popularity has spurred rapid development of many new merging methods, which are typical

Externí odkaz: http://arxiv.org/abs/2409.18314

Zobrazit plný text záznamu

Report

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning

Autor: Yadav, Prateek, Raffel, Colin, Muqeeth, Mohammed, Caccia, Lucas, Liu, Haokun, Chen, Tianlong, Bansal, Mohit, Choshen, Leshem, Sordoni, Alessandro

The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task. Model MoErging methods aim to recycle expert models to create an aggregate system with impro

Externí odkaz: http://arxiv.org/abs/2408.07057

Zobrazit plný text záznamu

Report

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Autor: Penedo, Guilherme, Kydlíček, Hynek, allal, Loubna Ben, Lozhkov, Anton, Mitchell, Margaret, Raffel, Colin, Von Werra, Leandro, Wolf, Thomas

The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-of-the-art open LLMs like Llama 3 and Mixtral are not publicly available and very little i

Externí odkaz: http://arxiv.org/abs/2406.17557

Zobrazit plný text záznamu

Report

Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation

Autor: Raffel, Matthew, Agostinelli, Victor, Chen, Lizhong

Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks, motivating their adoption in simultaneous translation. Current fine-tuning methods to adapt LLMs for simultaneous translation focus on promp

Externí odkaz: http://arxiv.org/abs/2405.10443

Zobrazit plný text záznamu

Report

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Autor: Pan, Bowen, Shen, Yikang, Liu, Haokun, Mishra, Mayank, Zhang, Gaoyuan, Oliva, Aude, Raffel, Colin, Panda, Rameswar

Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios. However, MoE models generally require 2-4$\t

Externí odkaz: http://arxiv.org/abs/2404.05567

Zobrazit plný text záznamu

Report

A Survey on Data Selection for Language Models

Autor: Albalak, Alon, Elazar, Yanai, Xie, Sang Michael, Longpre, Shayne, Lambert, Nathan, Wang, Xinyi, Muennighoff, Niklas, Hou, Bairu, Pan, Liangming, Jeong, Haewon, Raffel, Colin, Chang, Shiyu, Hashimoto, Tatsunori, Wang, William Yang

A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as the qualit

Externí odkaz: http://arxiv.org/abs/2402.16827

Zobrazit plný text záznamu

Report

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Autor: Patel, Ajay, Raffel, Colin, Callison-Burch, Chris

Large language models (LLMs) have become a dominant and important tool for NLP researchers in a wide range of tasks. Today, many researchers use LLMs in synthetic data generation, task evaluation, fine-tuning, distillation, and other model-in-the-loo

Externí odkaz: http://arxiv.org/abs/2402.10379

Zobrazit plný text záznamu

Report

Learning to Route Among Specialized Experts for Zero-Shot Generalization

Autor: Muqeeth, Mohammed, Liu, Haokun, Liu, Yufan, Raffel, Colin

Recently, there has been a widespread proliferation of "expert" language models that are specialized to a specific task or domain through parameter-efficient fine-tuning. How can we recycle large collections of expert language models to improve zero-

Externí odkaz: http://arxiv.org/abs/2402.05859

Zobrazit plný text záznamu

Report

Distributed Inference and Fine-tuning of Large Language Models Over The Internet

Autor: Borzunov, Alexander, Ryabinin, Max, Chumachenko, Artem, Baranchuk, Dmitry, Dettmers, Tim, Belkada, Younes, Samygin, Pavel, Raffel, Colin

Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to

Externí odkaz: http://arxiv.org/abs/2312.08361

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání