Multitask Prompted Training Enables Zero-Shot Task Generalization

Autor:	Sanh, Victor, Webson, Albert, Raffel, Colin, Bach, Stephen, Sutawika, Lintang, Alyafeai, Zaid, Chaffin, Antoine, Stiegler, Arnaud, Le Scao, Teven, Raja, Arun, Dey, Manan, Bari, M Saiful, Xu, Canwen, Thakker, Urmish, Sharma, Shanya, Szczechla, Eliza, Kim, Taewoon, Chhablani, Gunjan, V. Nayak, Nihal, Datta, Debajyoti, Chang, Jonathan, Jiang, Mike, Wang, Han, Manica, Matteo, Shen, Sheng, Yong, Zheng-Xin, Pandey, Harshit, Mckenna, Michael, Bawden, Rachel, Wang, Thomas, Neeraj, Trishala, Rozen, Jos, Sharma, Abheesht, Santilli, Andrea, Fevry, Thibault, Fries, Jason, Teehan, Ryan, Bers, Tali, Biderman, Stella, Gao, Leo, Wolf, Thomas, Rush, Alexander
Přispěvatelé:	Hugging Face, Department of Computer Science (Brown University), Brown University, Konvergen AI, King Fahd University of Petroleum and Minerals (KFUPM), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT), IMATAG [Rennes], Hyperscience, Institute for Infocomm Research - I²R [Singapore], SAP AI Research (SAP AI ), School of Computer Engineering [Singapore] (NTU), Nanyang Technological University [Singapour], Department of Computer Science and Engineering [Univ California San Diego] (CSE - UC San Diego), University of California [San Diego] (UC San Diego), University of California (UC)-University of California (UC), SambaNova Systems, Walmart Labs, Scott Tiger S.A., Vrije Universiteit Amsterdam [Amsterdam] (VU), Oracle, University of Virginia, AsusTeK Computer (ASUS), ZEALS, New York University [New York] (NYU), NYU System (NYU), IBM Research [Zurich], University of California [Berkeley] (UC Berkeley), University of California (UC), Sans affiliation, Parity, Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), CyberCube, Naver Labs Europe [Meylan], Birla Institute of Technology and Science (BITS Pilani), Università degli Studi di Roma 'La Sapienza' = Sapienza University [Rome] (UNIROMA), Point72, Stanford University, Charles River Analytics, EleutherAI, Booz Hallen Hamilton Inc, ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019)
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Zdroj:	ICLR 2022-Tenth International Conference on Learning Representations ICLR 2022-Tenth International Conference on Learning Representations, Apr 2022, Online, Unknown Region
Popis:	International audience; Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models’ pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely held-out tasks. We fine-tune a pre-trained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-bench benchmark, outperforming models up to 6x its size. All trained models are available at https://github.com/bigscience-workshop/t-zero, and all prompts are available at https://github.com/bigscience-workshop/promptsource.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::d03ed16ffb7aca406595ddc481bad1fd https://inria.hal.science/hal-03540072 Zobrazit plný text záznamu