Zobrazeno 1 - 10
of 6 500
pro vyhledávání: '"Morcos AS"'
Data curation is commonly considered a "secret-sauce" for LLM training, with higher quality data usually leading to better LLM performance. Given the scale of internet-scraped corpora, data pruning has become a larger and larger focus. Specifically,
Externí odkaz:
http://arxiv.org/abs/2407.00434
Autor:
Abbas, Amro, Rusak, Evgenia, Tirumala, Kushal, Brendel, Wieland, Chaudhuri, Kamalika, Morcos, Ari S.
Utilizing massive web-scale datasets has led to unprecedented performance gains in machine learning models, but also imposes outlandish compute requirements for their training. In order to improve training and data efficiency, we here push the limits
Externí odkaz:
http://arxiv.org/abs/2401.04578
Autor:
Yang, Yu, Singh, Aaditya K., Elhoushi, Mostafa, Mahmoud, Anas, Tirumala, Kushal, Gloeckle, Fabian, Rozière, Baptiste, Wu, Carole-Jean, Morcos, Ari S., Ardalani, Newsha
Code datasets, often collected from diverse and uncontrolled sources such as GitHub, potentially suffer from quality issues, thereby affecting the performance and training efficiency of Large Language Models (LLMs) optimized for code generation. Prev
Externí odkaz:
http://arxiv.org/abs/2312.02418
Autor:
Sofia Sheikh, Brent Vela, Pejman Honarmandi, Peter Morcos, David Shoukr, Ibrahim Karaman, Alaa Elwany, Raymundo Arróyave
Publikováno v:
npj Computational Materials, Vol 10, Iss 1, Pp 1-19 (2024)
Abstract In metal additive manufacturing (AM), processing parameters can affect the probability of macroscopic defect formation (lack-of-fusion, keyholing, balling), which can, in turn, jeopardize the final product’s integrity. A printability map c
Externí odkaz:
https://doaj.org/article/f249ce0fd0ea4525ae43b714abecbcc8
Autor:
Golrokhian-Sani, Amir-Ali1,2 (AUTHOR), Morcos, Maya1,2 (AUTHOR), Philippi, Alecco2 (AUTHOR), Al-Rawi, Reem2 (AUTHOR), Morcos, Marc3 (AUTHOR) marc.morcos1@hotmail.com, Fu, Rui4 (AUTHOR)
Publikováno v:
PLoS ONE. 12/30/2024, Vol. 19 Issue 12, p1-11. 11p.
Autor:
Aboushelib, Mohamed F., Morcos, Abdelfady B., Nawar, Samir, Shalabiea, Osama M., Awad, Zainab
Publikováno v:
Nature portfolio, Scientific Reports (2023), volume 13, page 16754
Photoelectric observations of night sky brightness (NSB) at different zenith distances and azimuths, covering all the sky, at the Egyptian Kottamia Astronomical observatory (KAO) site of coordinates {\phi} = 29{\deg}55.9'N and {\lambda} = 31{\deg}49.
Externí odkaz:
http://arxiv.org/abs/2310.05429
Autor:
Mahmoud, Anas, Elhoushi, Mostafa, Abbas, Amro, Yang, Yu, Ardalani, Newsha, Leather, Hugh, Morcos, Ari
Vision-Language Models (VLMs) are pretrained on large, diverse, and noisy web-crawled datasets. This underscores the critical need for dataset pruning, as the quality of these datasets is strongly correlated with the performance of VLMs on downstream
Externí odkaz:
http://arxiv.org/abs/2310.02110
Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While training on
Externí odkaz:
http://arxiv.org/abs/2308.12284
Autor:
Bordes, Florian, Shekhar, Shashank, Ibrahim, Mark, Bouchacourt, Diane, Vincent, Pascal, Morcos, Ari S.
Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and c
Externí odkaz:
http://arxiv.org/abs/2308.03977
It is commonly observed that deep networks trained for classification exhibit class-selective neurons in their early and intermediate layers. Intriguingly, recent studies have shown that these class-selective neurons can be ablated without deteriorat
Externí odkaz:
http://arxiv.org/abs/2305.17409