Zobrazeno 1 - 10
of 115
pro vyhledávání: '"Morcos, Ari"'
Data curation is commonly considered a "secret-sauce" for LLM training, with higher quality data usually leading to better LLM performance. Given the scale of internet-scraped corpora, data pruning has become a larger and larger focus. Specifically,
Externí odkaz:
http://arxiv.org/abs/2407.00434
Autor:
Abbas, Amro, Rusak, Evgenia, Tirumala, Kushal, Brendel, Wieland, Chaudhuri, Kamalika, Morcos, Ari S.
Utilizing massive web-scale datasets has led to unprecedented performance gains in machine learning models, but also imposes outlandish compute requirements for their training. In order to improve training and data efficiency, we here push the limits
Externí odkaz:
http://arxiv.org/abs/2401.04578
Autor:
Yang, Yu, Singh, Aaditya K., Elhoushi, Mostafa, Mahmoud, Anas, Tirumala, Kushal, Gloeckle, Fabian, Rozière, Baptiste, Wu, Carole-Jean, Morcos, Ari S., Ardalani, Newsha
Code datasets, often collected from diverse and uncontrolled sources such as GitHub, potentially suffer from quality issues, thereby affecting the performance and training efficiency of Large Language Models (LLMs) optimized for code generation. Prev
Externí odkaz:
http://arxiv.org/abs/2312.02418
Autor:
Mahmoud, Anas, Elhoushi, Mostafa, Abbas, Amro, Yang, Yu, Ardalani, Newsha, Leather, Hugh, Morcos, Ari
Vision-Language Models (VLMs) are pretrained on large, diverse, and noisy web-crawled datasets. This underscores the critical need for dataset pruning, as the quality of these datasets is strongly correlated with the performance of VLMs on downstream
Externí odkaz:
http://arxiv.org/abs/2310.02110
Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While training on
Externí odkaz:
http://arxiv.org/abs/2308.12284
Autor:
Bordes, Florian, Shekhar, Shashank, Ibrahim, Mark, Bouchacourt, Diane, Vincent, Pascal, Morcos, Ari S.
Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and c
Externí odkaz:
http://arxiv.org/abs/2308.03977
It is commonly observed that deep networks trained for classification exhibit class-selective neurons in their early and intermediate layers. Intriguingly, recent studies have shown that these class-selective neurons can be ablated without deteriorat
Externí odkaz:
http://arxiv.org/abs/2305.17409
Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their transfer p
Externí odkaz:
http://arxiv.org/abs/2304.13089
Autor:
Wortsman, Mitchell, Dettmers, Tim, Zettlemoyer, Luke, Morcos, Ari, Farhadi, Ali, Schmidt, Ludwig
We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized training which provides a speed-up of 13-25% while matching the
Externí odkaz:
http://arxiv.org/abs/2304.13013
Autor:
Balestriero, Randall, Ibrahim, Mark, Sobal, Vlad, Morcos, Ari, Shekhar, Shashank, Goldstein, Tom, Bordes, Florian, Bardes, Adrien, Mialon, Gregoire, Tian, Yuandong, Schwarzschild, Avi, Wilson, Andrew Gordon, Geiping, Jonas, Garrido, Quentin, Fernandez, Pierre, Bar, Amir, Pirsiavash, Hamed, LeCun, Yann, Goldblum, Micah
Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, succes
Externí odkaz:
http://arxiv.org/abs/2304.12210