Zobrazeno 1 - 10
of 39
pro vyhledávání: '"Wadden, David"'
Autor:
Muennighoff, Niklas, Soldaini, Luca, Groeneveld, Dirk, Lo, Kyle, Morrison, Jacob, Min, Sewon, Shi, Weijia, Walsh, Pete, Tafjord, Oyvind, Lambert, Nathan, Gu, Yuling, Arora, Shane, Bhagia, Akshita, Schwenk, Dustin, Wadden, David, Wettig, Alexander, Hui, Binyuan, Dettmers, Tim, Kiela, Douwe, Farhadi, Ali, Smith, Noah A., Koh, Pang Wei, Singh, Amanpreet, Hajishirzi, Hannaneh
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to creat
Externí odkaz:
http://arxiv.org/abs/2409.02060
Autor:
Hsu, Chao-Chun, Bransom, Erin, Sparks, Jenna, Kuehl, Bailey, Tan, Chenhao, Wadden, David, Wang, Lucy Lu, Naik, Aakanksha
Literature review requires researchers to synthesize a large amount of information and is increasingly challenging as the scientific literature expands. In this work, we investigate the potential of LLMs for producing hierarchical organizations of sc
Externí odkaz:
http://arxiv.org/abs/2407.16148
Autor:
Wadden, David, Shi, Kejian, Morrison, Jacob, Naik, Aakanksha, Singh, Shruti, Barzilay, Nitzan, Lo, Kyle, Hope, Tom, Soldaini, Luca, Shen, Shannon Zejiang, Downey, Doug, Hajishirzi, Hannaneh, Cohan, Arman
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, s
Externí odkaz:
http://arxiv.org/abs/2406.07835
Autor:
Khalifa, Muhammad, Wadden, David, Strubell, Emma, Lee, Honglak, Wang, Lu, Beltagy, Iz, Peng, Hao
Large language models (LLMs) learn a vast amount of knowledge during pretraining, but they are often oblivious to the source(s) of such knowledge. We investigate the problem of intrinsic source citation, where LLMs are required to cite the pretrainin
Externí odkaz:
http://arxiv.org/abs/2404.01019
Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents. In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer
Externí odkaz:
http://arxiv.org/abs/2403.03866
Autor:
Ivison, Hamish, Wang, Yizhong, Pyatkin, Valentina, Lambert, Nathan, Peters, Matthew, Dasigi, Pradeep, Jang, Joel, Wadden, David, Smith, Noah A., Beltagy, Iz, Hajishirzi, Hannaneh
Since the release of T\"ULU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques. We test and incorporate a number of these advances into T\"ULU, resulting in T\"ULU
Externí odkaz:
http://arxiv.org/abs/2311.10702
Recent progress in natural language processing (NLP) owes much to remarkable advances in large language models (LLMs). Nevertheless, LLMs frequently "hallucinate," resulting in non-factual outputs. Our carefully-designed human evaluation substantiate
Externí odkaz:
http://arxiv.org/abs/2310.14564
Autor:
Weller, Orion, Lo, Kyle, Wadden, David, Lawrie, Dawn, Van Durme, Benjamin, Cohan, Arman, Soldaini, Luca
Using large language models (LMs) for query or document expansion can improve generalization in information retrieval. However, it is unknown whether these techniques are universally beneficial or only effective in specific settings, such as for part
Externí odkaz:
http://arxiv.org/abs/2309.08541
What is the effect of releasing a preprint of a paper before it is submitted for peer review? No randomized controlled trial has been conducted, so we turn to observational data to answer this question. We use data from the ICLR conference (2018--202
Externí odkaz:
http://arxiv.org/abs/2306.13891
Autor:
Wang, Yizhong, Ivison, Hamish, Dasigi, Pradeep, Hessel, Jack, Khot, Tushar, Chandu, Khyathi Raghavi, Wadden, David, MacMillan, Kelsey, Smith, Noah A., Beltagy, Iz, Hajishirzi, Hannaneh
In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with state-of-the-art proprietary models, these claims are often acc
Externí odkaz:
http://arxiv.org/abs/2306.04751