Zobrazeno 1 - 10
of 802
pro vyhledávání: '"Bhagia A"'
Autor:
Muennighoff, Niklas, Soldaini, Luca, Groeneveld, Dirk, Lo, Kyle, Morrison, Jacob, Min, Sewon, Shi, Weijia, Walsh, Pete, Tafjord, Oyvind, Lambert, Nathan, Gu, Yuling, Arora, Shane, Bhagia, Akshita, Schwenk, Dustin, Wadden, David, Wettig, Alexander, Hui, Binyuan, Dettmers, Tim, Kiela, Douwe, Farhadi, Ali, Smith, Noah A., Koh, Pang Wei, Singh, Amanpreet, Hajishirzi, Hannaneh
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to creat
Externí odkaz:
http://arxiv.org/abs/2409.02060
Autor:
Groeneveld, Dirk, Beltagy, Iz, Walsh, Pete, Bhagia, Akshita, Kinney, Rodney, Tafjord, Oyvind, Jha, Ananya Harsh, Ivison, Hamish, Magnusson, Ian, Wang, Yizhong, Arora, Shane, Atkinson, David, Authur, Russell, Chandu, Khyathi Raghavi, Cohan, Arman, Dumas, Jennifer, Elazar, Yanai, Gu, Yuling, Hessel, Jack, Khot, Tushar, Merrill, William, Morrison, Jacob, Muennighoff, Niklas, Naik, Aakanksha, Nam, Crystal, Peters, Matthew E., Pyatkin, Valentina, Ravichander, Abhilasha, Schwenk, Dustin, Shah, Saurabh, Smith, Will, Strubell, Emma, Subramani, Nishant, Wortsman, Mitchell, Dasigi, Pradeep, Lambert, Nathan, Richardson, Kyle, Zettlemoyer, Luke, Dodge, Jesse, Lo, Kyle, Soldaini, Luca, Smith, Noah A., Hajishirzi, Hannaneh
Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important det
Externí odkaz:
http://arxiv.org/abs/2402.00838
Autor:
Soldaini, Luca, Kinney, Rodney, Bhagia, Akshita, Schwenk, Dustin, Atkinson, David, Authur, Russell, Bogin, Ben, Chandu, Khyathi, Dumas, Jennifer, Elazar, Yanai, Hofmann, Valentin, Jha, Ananya Harsh, Kumar, Sachin, Lucy, Li, Lyu, Xinxi, Lambert, Nathan, Magnusson, Ian, Morrison, Jacob, Muennighoff, Niklas, Naik, Aakanksha, Nam, Crystal, Peters, Matthew E., Ravichander, Abhilasha, Richardson, Kyle, Shen, Zejiang, Strubell, Emma, Subramani, Nishant, Tafjord, Oyvind, Walsh, Pete, Zettlemoyer, Luke, Smith, Noah A., Hajishirzi, Hannaneh, Beltagy, Iz, Groeneveld, Dirk, Dodge, Jesse, Lo, Kyle
Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to
Externí odkaz:
http://arxiv.org/abs/2402.00159
Autor:
Magnusson, Ian, Bhagia, Akshita, Hofmann, Valentin, Soldaini, Luca, Jha, Ananya Harsh, Tafjord, Oyvind, Schwenk, Dustin, Walsh, Evan Pete, Elazar, Yanai, Lo, Kyle, Groeneveld, Dirk, Beltagy, Iz, Hajishirzi, Hannaneh, Smith, Noah A., Richardson, Kyle, Dodge, Jesse
Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribut
Externí odkaz:
http://arxiv.org/abs/2312.10523
Autor:
Groeneveld, Dirk, Awadalla, Anas, Beltagy, Iz, Bhagia, Akshita, Magnusson, Ian, Peng, Hao, Tafjord, Oyvind, Walsh, Pete, Richardson, Kyle, Dodge, Jesse
The success of large language models has shifted the evaluation paradigms in natural language processing (NLP). The community's interest has drifted towards comparing NLP models across many tasks, domains, and datasets, often at an extreme scale. Thi
Externí odkaz:
http://arxiv.org/abs/2312.10253
Autor:
Elazar, Yanai, Bhagia, Akshita, Magnusson, Ian, Ravichander, Abhilasha, Schwenk, Dustin, Suhr, Alane, Walsh, Pete, Groeneveld, Dirk, Soldaini, Luca, Singh, Sameer, Hajishirzi, Hanna, Smith, Noah A., Dodge, Jesse
Large text corpora are the backbone of language models. However, we have a limited understanding of the content of these corpora, including general statistics, quality, social factors, and inclusion of evaluation data (contamination). In this work, w
Externí odkaz:
http://arxiv.org/abs/2310.20707
Autor:
Bhagia, Div
This paper presents a novel approach to distinguish the impact of duration-dependent forces and adverse selection on the exit rate from unemployment by leveraging variation in the length of layoff notices. I formulate a Mixed Hazard model in discrete
Externí odkaz:
http://arxiv.org/abs/2305.17344
Recent NLP models have shown the remarkable ability to effectively generalise `zero-shot' to new tasks using only natural language instructions as guidance. However, many of these approaches suffer from high computational costs due to their reliance
Externí odkaz:
http://arxiv.org/abs/2212.10315
Autor:
Wu, Zhaofeng, Logan IV, Robert L., Walsh, Pete, Bhagia, Akshita, Groeneveld, Dirk, Singh, Sameer, Beltagy, Iz
Recently introduced language model prompting methods can achieve high accuracy in zero- and few-shot settings while requiring few to no learned task-specific parameters. Nevertheless, these methods still often trail behind full model finetuning. In t
Externí odkaz:
http://arxiv.org/abs/2210.10258
Autor:
FitzGerald, Jack, Ananthakrishnan, Shankar, Arkoudas, Konstantine, Bernardi, Davide, Bhagia, Abhishek, Bovi, Claudio Delli, Cao, Jin, Chada, Rakesh, Chauhan, Amit, Chen, Luoxin, Dwarakanath, Anurag, Dwivedi, Satyam, Gojayev, Turan, Gopalakrishnan, Karthik, Gueudre, Thomas, Hakkani-Tur, Dilek, Hamza, Wael, Hueser, Jonathan, Jose, Kevin Martin, Khan, Haidar, Liu, Beiye, Lu, Jianhua, Manzotti, Alessandro, Natarajan, Pradeep, Owczarzak, Karolina, Oz, Gokmen, Palumbo, Enrico, Peris, Charith, Prakash, Chandana Satya, Rawls, Stephen, Rosenbaum, Andy, Shenoy, Anjali, Soltan, Saleh, Sridhar, Mukund Harakere, Tan, Liz, Triefenbach, Fabian, Wei, Pan, Yu, Haiyang, Zheng, Shuai, Tur, Gokhan, Natarajan, Prem
Publikováno v:
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '22), August 14-18, 2022, Washington, DC, USA
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the N
Externí odkaz:
http://arxiv.org/abs/2206.07808