Zobrazeno 1 - 10
of 657
pro vyhledávání: '"GARG, SAURABH"'
Autor:
Byun, Yewon, Mehta, Sanket V., Garg, Saurabh, Strubell, Emma, Oberst, Michael, Wilder, Bryan, Lipton, Zachary C.
In many real-world settings, regulations and economic incentives permit the sharing of models but not data across institutional boundaries. In such scenarios, practitioners might hope to adapt models to new domains, without losing performance on prev
Externí odkaz:
http://arxiv.org/abs/2412.17009
Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora.
Externí odkaz:
http://arxiv.org/abs/2411.08870
Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora.
Externí odkaz:
http://arxiv.org/abs/2411.04118
Autor:
Agrawal, Pravesh, Antoniak, Szymon, Hanna, Emma Bou, Bout, Baptiste, Chaplot, Devendra, Chudnovsky, Jessica, Costa, Diogo, De Monicault, Baudouin, Garg, Saurabh, Gervet, Theophile, Ghosh, Soham, Héliou, Amélie, Jacob, Paul, Jiang, Albert Q., Khandelwal, Kartik, Lacroix, Timothée, Lample, Guillaume, Casas, Diego Las, Lavril, Thibaut, Scao, Teven Le, Lo, Andy, Marshall, William, Martin, Louis, Mensch, Arthur, Muddireddy, Pavankumar, Nemychnikova, Valera, Pellat, Marie, Von Platen, Patrick, Raghuraman, Nikhil, Rozière, Baptiste, Sablayrolles, Alexandre, Saulnier, Lucile, Sauvestre, Romain, Shang, Wendy, Soletskyi, Roman, Stewart, Lawrence, Stock, Pierre, Studnia, Joachim, Subramanian, Sandeep, Vaze, Sagar, Wang, Thomas, Yang, Sophia
We introduce Pixtral-12B, a 12--billion-parameter multimodal language model. Pixtral-12B is trained to understand both natural images and documents, achieving leading performance on various multimodal benchmarks, surpassing a number of larger models.
Externí odkaz:
http://arxiv.org/abs/2410.07073
Training on model-generated synthetic data is a promising approach for finetuning LLMs, but it remains unclear when it helps or hurts. In this paper, we investigate this question for math reasoning via an empirical study, followed by building a conce
Externí odkaz:
http://arxiv.org/abs/2406.14532
Autor:
Li, Jeffrey, Fang, Alex, Smyrnis, Georgios, Ivgi, Maor, Jordan, Matt, Gadre, Samir, Bansal, Hritik, Guha, Etash, Keh, Sedrick, Arora, Kushal, Garg, Saurabh, Xin, Rui, Muennighoff, Niklas, Heckel, Reinhard, Mercat, Jean, Chen, Mayee, Gururangan, Suchin, Wortsman, Mitchell, Albalak, Alon, Bitton, Yonatan, Nezhurina, Marianna, Abbas, Amro, Hsieh, Cheng-Yu, Ghosh, Dhruba, Gardner, Josh, Kilian, Maciej, Zhang, Hanlin, Shao, Rulin, Pratt, Sarah, Sanyal, Sunny, Ilharco, Gabriel, Daras, Giannis, Marathe, Kalyani, Gokaslan, Aaron, Zhang, Jieyu, Chandu, Khyathi, Nguyen, Thao, Vasiljevic, Igor, Kakade, Sham, Song, Shuran, Sanghavi, Sujay, Faghri, Fartash, Oh, Sewoong, Zettlemoyer, Luke, Lo, Kyle, El-Nouby, Alaaeldin, Pouransari, Hadi, Toshev, Alexander, Wang, Stephanie, Groeneveld, Dirk, Soldaini, Luca, Koh, Pang Wei, Jitsev, Jenia, Kollar, Thomas, Dimakis, Alexandros G., Carmon, Yair, Dave, Achal, Schmidt, Ludwig, Shankar, Vaishaal
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretrai
Externí odkaz:
http://arxiv.org/abs/2406.11794
Trained models are often composed with post-hoc transforms such as temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to improve performance, robustness, uncertainty estimation, etc. However, such transforms are typically appl
Externí odkaz:
http://arxiv.org/abs/2404.07815
Autor:
Garg, Saurabh, Setlur, Amrith, Lipton, Zachary Chase, Balakrishnan, Sivaraman, Smith, Virginia, Raghunathan, Aditi
Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning). However, despite the populari
Externí odkaz:
http://arxiv.org/abs/2312.03318
Deep learning plays an important role in modern agriculture, especially in plant pathology using leaf images where convolutional neural networks (CNN) are attracting a lot of attention. While numerous reviews have explored the applications of deep le
Externí odkaz:
http://arxiv.org/abs/2310.16273
Autor:
Garg, Saurabh, Farajtabar, Mehrdad, Pouransari, Hadi, Vemulapalli, Raviteja, Mehta, Sachin, Tuzel, Oncel, Shankar, Vaishaal, Faghri, Fartash
Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale
Externí odkaz:
http://arxiv.org/abs/2310.16226