Zobrazeno 1 - 10
of 10
pro vyhledávání: '"Simig, Daniel"'
Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While training on
Externí odkaz:
http://arxiv.org/abs/2308.12284
Autor:
Han, Xiaochuang, Simig, Daniel, Mihaylov, Todor, Tsvetkov, Yulia, Celikyilmaz, Asli, Wang, Tianlu
In-context learning (ICL) improves language models' performance on a variety of NLP tasks by simply demonstrating a handful of examples at inference time. It is not well understood why ICL ability emerges, as the model has never been specifically tra
Externí odkaz:
http://arxiv.org/abs/2306.15091
Autor:
Cadavid-Sanchez, Sebastian, Kacem, Khalil, Frade, Rafael Aparecido Martins, Boehm, Johannes, Chaney, Thomas, Lashkari, Danial, Simig, Daniel
To study social, economic, and historical questions, researchers in the social sciences and humanities have started to use increasingly large unstructured textual datasets. While recent advances in NLP provide many tools to efficiently process such d
Externí odkaz:
http://arxiv.org/abs/2305.14588
Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books. We proposed Megabyte, a multi-scale decoder architecture that enables end-to-end diffe
Externí odkaz:
http://arxiv.org/abs/2305.07185
Progress in machine learning has been driven in large part by massive increases in data. However, large web-scale datasets such as LAION are largely uncurated beyond searches for exact duplicates, potentially leaving much redundancy. Here, we introdu
Externí odkaz:
http://arxiv.org/abs/2303.09540
Autor:
Iyer, Srinivasan, Lin, Xi Victoria, Pasunuru, Ramakanth, Mihaylov, Todor, Simig, Daniel, Yu, Ping, Shuster, Kurt, Wang, Tianlu, Liu, Qing, Koura, Punit Singh, Li, Xian, O'Horo, Brian, Pereyra, Gabriel, Wang, Jeff, Dewan, Christopher, Celikyilmaz, Asli, Zettlemoyer, Luke, Stoyanov, Ves
Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited unde
Externí odkaz:
http://arxiv.org/abs/2212.12017
Autor:
Simig, Daniel, Wang, Tianlu, Dankers, Verna, Henderson, Peter, Batsuren, Khuyagbaatar, Hupkes, Dieuwke, Diab, Mona
In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis. Here, we argue that - especially given the well-known fact that benchmarks often contain bia
Externí odkaz:
http://arxiv.org/abs/2210.01734
Autor:
Simig, Daniel, Petroni, Fabio, Yanki, Pouya, Popat, Kashyap, Du, Christina, Riedel, Sebastian, Yazdani, Majid
The extreme multi-label classification (XMC) task aims at tagging content with a subset of labels from an extremely large label set. The label vocabulary is typically defined in advance by domain experts and assumed to capture all necessary tags. How
Externí odkaz:
http://arxiv.org/abs/2205.05812
Autor:
Zhang, Susan, Roller, Stephen, Goyal, Naman, Artetxe, Mikel, Chen, Moya, Chen, Shuohui, Dewan, Christopher, Diab, Mona, Li, Xian, Lin, Xi Victoria, Mihaylov, Todor, Ott, Myle, Shleifer, Sam, Shuster, Kurt, Simig, Daniel, Koura, Punit Singh, Sridhar, Anjali, Wang, Tianlu, Zettlemoyer, Luke
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant
Externí odkaz:
http://arxiv.org/abs/2205.01068
Autor:
Lin, Xi Victoria, Mihaylov, Todor, Artetxe, Mikel, Wang, Tianlu, Chen, Shuohui, Simig, Daniel, Ott, Myle, Goyal, Naman, Bhosale, Shruti, Du, Jingfei, Pasunuru, Ramakanth, Shleifer, Sam, Koura, Punit Singh, Chaudhary, Vishrav, O'Horo, Brian, Wang, Jeff, Zettlemoyer, Luke, Kozareva, Zornitsa, Diab, Mona, Stoyanov, Veselin, Li, Xian
Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cro
Externí odkaz:
http://arxiv.org/abs/2112.10668