Zobrazeno 1 - 10
of 3 981
pro vyhledávání: '"Sanghavi A"'
Mixtures of Experts (MoE) are Machine Learning models that involve partitioning the input space, with a separate "expert" model trained on each partition. Recently, MoE have become popular as components in today's large language models as a means to
Externí odkaz:
http://arxiv.org/abs/2411.06056
Contrastive learning attempts to learn representations from un-labeled data; it does so via a loss function that encourages the embedding of a point to be close to that of its augmentations, and far from the embeddings of random other points. This si
Externí odkaz:
http://arxiv.org/abs/2411.03517
Autor:
Carralot, F., Carones, A., Krachmalnicoff, N., Ghigna, T., Novelli, A., Pagano, L., Piacentini, F., Baccigalupi, C., Adak, D., Anand, A., Aumont, J., Azzoni, S., Ballardini, M., Banday, A. J., Barreiro, R. B., Bartolo, N., Basak, S., Basyrov, A., Bersanelli, M., Bortolami, M., Brinckmann, T., Cacciotti, F., Campeti, P., Carinos, E., Casas, F. J., Cheung, K., Clermont, L., Columbro, F., Conenna, G., Coppi, G., Coppolecchia, A., Cuttaia, F., de Bernardis, P., De Lucia, M., Della Torre, S., Di Giorgi, E., Diego-Palazuelos, P., Essinger-Hileman, T., Ferreira, E., Finelli, F., Franceschet, C., Galloni, G., Galloway, M., Gervasi, M., Génova-Santos, R. T., Giardiello, S., Gimeno-Amo, C., Gjerløw, E., Gruppuso, A., Hazumi, M., Henrot-Versillé, S., Hergt, L. T., Hivon, E., Ishino, H., Jost, B., Kohri, K., Lamagna, L., Leloup, C., Lembo, M., Levrier, F., Lonappan, A. I., López-Caniego, M., Luzzi, G., Macias-Perez, J., Martínez-González, E., Masi, S., Matarrese, S., Matsumura, T., Micheli, S., Monelli, M., Montier, L., Morgante, G., Mot, B., Mousset, L., Nagano, Y., Nagata, R., Namikawa, T., Natoli, P., Obata, I., Occhiuzzi, A., Paiella, A., Paoletti, D., Pascual-Cisneros, G., Patanchon, G., Pavlidou, V., Pisano, G., Polenta, G., Porcelli, L., Puglisi, G., Raffuzzi, N., Remazeilles, M., Rubiño-Martín, J. A., Ruiz-Granda, M., Sanghavi, J., Scott, D., Shiraishi, M., Sullivan, R. M., Takase, Y., Tassis, K., Terenzi, L., Tomasi, M., Tristram, M., Vacher, L., van Tent, B., Vielva, P., Weymann-Despres, G., Wollack, E. J., Zannoni, M., Zhou, Y.
Future cosmic microwave background (CMB) experiments are primarily targeting a detection of the primordial $B$-mode polarisation. The faintness of this signal requires exquisite control of systematic effects which may bias the measurements. In this w
Externí odkaz:
http://arxiv.org/abs/2411.02080
We investigate whether in-context examples, widely used in decoder-only language models (LLMs), can improve embedding model performance in retrieval tasks. Unlike in LLMs, naively prepending in-context examples (query-document pairs) to the target qu
Externí odkaz:
http://arxiv.org/abs/2410.20088
Autor:
Wang, Haochen, Masui, Kiyoshi, Bandura, Kevin, Chakraborty, Arnab, Dobbs, Matt, Foreman, Simon, Gray, Liam, Halpern, Mark, Joseph, Albin, MacEachern, Joshua, Mena-Parra, Juan, Miller, Kyle, Newburgh, Laura, Paul, Sourabh, Reda, Alex, Sanghavi, Pranav, Siegel, Seth, Wulf, Dallas
The main challenge of 21 cm cosmology experiments is astrophysical foregrounds which are difficult to separate from the signal due to telescope systematics. An earlier study has shown that foreground residuals induced by antenna gain errors can be es
Externí odkaz:
http://arxiv.org/abs/2408.08949
Autor:
Tyndall, Will, Reda, Alex, Shaw, J. Richard, Bandura, Kevin, Chakraborty, Arnab, Kuhn, Emily, MacEachern, Joshua, Mena-Parra, Juan, Newburgh, Laura, Ordog, Anna, Pinsonneault-Marotte, Tristan, Polish, Anna Rose, Saliwanchik, Ben, Sanghavi, Pranav, Siegel, Seth R., Whitmer, Audrey, Wulf, Dallas
We present beam measurements of the CHIME telescope using a radio calibration source deployed on a drone payload. During test flights, the pulsing calibration source and the telescope were synchronized to GPS time, enabling in-situ background subtrac
Externí odkaz:
http://arxiv.org/abs/2407.04848
Autor:
Low, Yen Sia, Jackson, Michael L., Hyde, Rebecca J., Brown, Robert E., Sanghavi, Neil M., Baldwin, Julian D., Pike, C. William, Muralidharan, Jananee, Hui, Gavin, Alexander, Natasha, Hassan, Hadeel, Nene, Rahul V., Pike, Morgan, Pokrzywa, Courtney J., Vedak, Shivam, Yan, Adam Paul, Yao, Dong-han, Zipursky, Amy R., Dinh, Christina, Ballentine, Philip, Derieg, Dan C., Polony, Vladimir, Chawdry, Rehan N., Davies, Jordan, Hyde, Brigham B., Shah, Nigam H., Gombar, Saurabh
Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both ch
Externí odkaz:
http://arxiv.org/abs/2407.00541
Data pruning, the combinatorial task of selecting a small and informative subset from a large dataset, is crucial for mitigating the enormous computational costs associated with training data-hungry modern deep learning models at scale. Since large-s
Externí odkaz:
http://arxiv.org/abs/2406.17188
Autor:
Das, Rudrajit, Dhillon, Inderjit S., Epasto, Alessandro, Javanmard, Adel, Mao, Jieming, Mirrokni, Vahab, Sanghavi, Sujay, Zhong, Peilin
The performance of a model trained with \textit{noisy labels} is often improved by simply \textit{retraining} the model with its own predicted \textit{hard} labels (i.e., $1$/$0$ labels). Yet, a detailed theoretical characterization of this phenomeno
Externí odkaz:
http://arxiv.org/abs/2406.11206
Autor:
Li, Jeffrey, Fang, Alex, Smyrnis, Georgios, Ivgi, Maor, Jordan, Matt, Gadre, Samir, Bansal, Hritik, Guha, Etash, Keh, Sedrick, Arora, Kushal, Garg, Saurabh, Xin, Rui, Muennighoff, Niklas, Heckel, Reinhard, Mercat, Jean, Chen, Mayee, Gururangan, Suchin, Wortsman, Mitchell, Albalak, Alon, Bitton, Yonatan, Nezhurina, Marianna, Abbas, Amro, Hsieh, Cheng-Yu, Ghosh, Dhruba, Gardner, Josh, Kilian, Maciej, Zhang, Hanlin, Shao, Rulin, Pratt, Sarah, Sanyal, Sunny, Ilharco, Gabriel, Daras, Giannis, Marathe, Kalyani, Gokaslan, Aaron, Zhang, Jieyu, Chandu, Khyathi, Nguyen, Thao, Vasiljevic, Igor, Kakade, Sham, Song, Shuran, Sanghavi, Sujay, Faghri, Fartash, Oh, Sewoong, Zettlemoyer, Luke, Lo, Kyle, El-Nouby, Alaaeldin, Pouransari, Hadi, Toshev, Alexander, Wang, Stephanie, Groeneveld, Dirk, Soldaini, Luca, Koh, Pang Wei, Jitsev, Jenia, Kollar, Thomas, Dimakis, Alexandros G., Carmon, Yair, Dave, Achal, Schmidt, Ludwig, Shankar, Vaishaal
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretrai
Externí odkaz:
http://arxiv.org/abs/2406.11794