Výsledky vyhledávání

Report

Brevity is the soul of wit: Pruning long files for code generation

Autor: Singh, Aaditya K., Yang, Yu, Tirumala, Kushal, Elhoushi, Mostafa, Morcos, Ari S.

Data curation is commonly considered a "secret-sauce" for LLM training, with higher quality data usually leading to better LLM performance. Given the scale of internet-scraped corpora, data pruning has become a larger and larger focus. Specifically,

Externí odkaz: http://arxiv.org/abs/2407.00434

Zobrazit plný text záznamu

Report

Effective pruning of web-scale datasets based on complexity of concept clusters

Autor: Abbas, Amro, Rusak, Evgenia, Tirumala, Kushal, Brendel, Wieland, Chaudhuri, Kamalika, Morcos, Ari S.

Utilizing massive web-scale datasets has led to unprecedented performance gains in machine learning models, but also imposes outlandish compute requirements for their training. In order to improve training and data efficiency, we here push the limits

Externí odkaz: http://arxiv.org/abs/2401.04578

Zobrazit plný text záznamu

Report

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

Autor: Yang, Yu, Singh, Aaditya K., Elhoushi, Mostafa, Mahmoud, Anas, Tirumala, Kushal, Gloeckle, Fabian, Rozière, Baptiste, Wu, Carole-Jean, Morcos, Ari S., Ardalani, Newsha

Code datasets, often collected from diverse and uncontrolled sources such as GitHub, potentially suffer from quality issues, thereby affecting the performance and training efficiency of Large Language Models (LLMs) optimized for code generation. Prev

Externí odkaz: http://arxiv.org/abs/2312.02418

Zobrazit plný text záznamu

Report

Sieve: Multimodal Dataset Pruning Using Image Captioning Models

Autor: Mahmoud, Anas, Elhoushi, Mostafa, Abbas, Amro, Yang, Yu, Ardalani, Newsha, Leather, Hugh, Morcos, Ari

Vision-Language Models (VLMs) are pretrained on large, diverse, and noisy web-crawled datasets. This underscores the critical need for dataset pruning, as the quality of these datasets is strongly correlated with the performance of VLMs on downstream

Externí odkaz: http://arxiv.org/abs/2310.02110

Zobrazit plný text záznamu

Report

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Autor: Tirumala, Kushal, Simig, Daniel, Aghajanyan, Armen, Morcos, Ari S.

Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While training on

Externí odkaz: http://arxiv.org/abs/2308.12284

Zobrazit plný text záznamu

Report

PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning

Autor: Bordes, Florian, Shekhar, Shashank, Ibrahim, Mark, Bouchacourt, Diane, Vincent, Pascal, Morcos, Ari S.

Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and c

Externí odkaz: http://arxiv.org/abs/2308.03977

Zobrazit plný text záznamu

Report

On the special role of class-selective neurons in early training

Autor: Ranadive, Omkar, Thakurdesai, Nikhil, Morcos, Ari S, Leavitt, Matthew, Deny, Stéphane

It is commonly observed that deep networks trained for classification exhibit class-selective neurons in their early and intermediate layers. Intriguingly, recent studies have shown that these class-selective neurons can be ablated without deteriorat

Externí odkaz: http://arxiv.org/abs/2305.17409

Zobrazit plný text záznamu

Report

Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

Autor: Shekhar, Shashank, Bordes, Florian, Vincent, Pascal, Morcos, Ari

Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their transfer p

Externí odkaz: http://arxiv.org/abs/2304.13089

Zobrazit plný text záznamu

Report

Stable and low-precision training for large-scale vision-language models

Autor: Wortsman, Mitchell, Dettmers, Tim, Zettlemoyer, Luke, Morcos, Ari, Farhadi, Ali, Schmidt, Ludwig

We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized training which provides a speed-up of 13-25% while matching the

Externí odkaz: http://arxiv.org/abs/2304.13013

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání