Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Rosenbaum, Andy"'
The emergence of Large Language Models (LLMs) with capabilities like In-Context Learning (ICL) has ushered in new possibilities for data generation across various domains while minimizing the need for extensive data collection and modeling techniques
Externí odkaz:
http://arxiv.org/abs/2404.09163
Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one model from
Externí odkaz:
http://arxiv.org/abs/2306.08756
Autor:
Chen, Maximillian, Papangelis, Alexandros, Tao, Chenyang, Kim, Seokhwan, Rosenbaum, Andy, Liu, Yang, Yu, Zhou, Hakkani-Tur, Dilek
Collecting high quality conversational data can be very expensive for most applications and infeasible for others due to privacy, ethical, or similar concerns. A promising direction to tackle this problem is to generate synthetic dialogues by prompti
Externí odkaz:
http://arxiv.org/abs/2302.03269
Autor:
Chen, Maximillian, Papangelis, Alexandros, Tao, Chenyang, Rosenbaum, Andy, Kim, Seokhwan, Liu, Yang, Yu, Zhou, Hakkani-Tur, Dilek
Dialogue understanding tasks often necessitate abundant annotated data to achieve good performance and that presents challenges in low-resource settings. To alleviate this barrier, we explore few-shot data augmentation for dialogue understanding by p
Externí odkaz:
http://arxiv.org/abs/2210.14169
A bottleneck to developing Semantic Parsing (SP) models is the need for a large volume of human-labeled training data. Given the complexity and cost of human annotation for SP, labeled data is often scarce, particularly in multilingual settings. Larg
Externí odkaz:
http://arxiv.org/abs/2210.07074
We present LINGUIST, a method for generating annotated data for Intent Classification and Slot Tagging (IC+ST), via fine-tuning AlexaTM 5B, a 5-billion-parameter multilingual sequence-to-sequence (seq2seq) model, on a flexible instruction prompt. In
Externí odkaz:
http://arxiv.org/abs/2209.09900
Autor:
Soltan, Saleh, Ananthakrishnan, Shankar, FitzGerald, Jack, Gupta, Rahul, Hamza, Wael, Khan, Haidar, Peris, Charith, Rawls, Stephen, Rosenbaum, Andy, Rumshisky, Anna, Prakash, Chandana Satya, Sridhar, Mukund, Triefenbach, Fabian, Verma, Apurv, Tur, Gokhan, Natarajan, Prem
In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various
Externí odkaz:
http://arxiv.org/abs/2208.01448
Autor:
FitzGerald, Jack, Ananthakrishnan, Shankar, Arkoudas, Konstantine, Bernardi, Davide, Bhagia, Abhishek, Bovi, Claudio Delli, Cao, Jin, Chada, Rakesh, Chauhan, Amit, Chen, Luoxin, Dwarakanath, Anurag, Dwivedi, Satyam, Gojayev, Turan, Gopalakrishnan, Karthik, Gueudre, Thomas, Hakkani-Tur, Dilek, Hamza, Wael, Hueser, Jonathan, Jose, Kevin Martin, Khan, Haidar, Liu, Beiye, Lu, Jianhua, Manzotti, Alessandro, Natarajan, Pradeep, Owczarzak, Karolina, Oz, Gokmen, Palumbo, Enrico, Peris, Charith, Prakash, Chandana Satya, Rawls, Stephen, Rosenbaum, Andy, Shenoy, Anjali, Soltan, Saleh, Sridhar, Mukund Harakere, Tan, Liz, Triefenbach, Fabian, Wei, Pan, Yu, Haiyang, Zheng, Shuai, Tur, Gokhan, Natarajan, Prem
Publikováno v:
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '22), August 14-18, 2022, Washington, DC, USA
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the N
Externí odkaz:
http://arxiv.org/abs/2206.07808
Autor:
FitzGerald, Jack, Ananthakrishnan, Shankar, Arkoudas, Konstantine, Bernardi, Davide, Bhagia, Abhishek, Bovi, Claudio Delli, Cao, Jin, Chada, Rakesh, Chauhan, Amit, Chen, Luoxin, Dwarakanath, Anurag, Dwivedi, Satyam, Gojayev, Turan, Gopalakrishnan, Karthik, Gueudre, Thomas, Hakkani-Tur, Dilek, Hamza, Wael, Hueser, Jonathan, Jose, Kevin Martin, Khan, Haidar, Liu, Beiye, Lu, Jianhua, Manzotti, Alessandro, Natarajan, Pradeep, Owczarzak, Karolina, Oz, Gokmen, Palumbo, Enrico, Peris, Charith, Prakash, Chandana Satya, Rawls, Stephen, Rosenbaum, Andy, Shenoy, Anjali, Soltan, Saleh, Sridhar, Mukund Harakere, Tan, Liz, Triefenbach, Fabian, Wei, Pan, Yu, Haiyang, Zheng, Shuai, Tur, Gokhan, Natarajan, Prem
Publikováno v:
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the N
Autor:
Flory, Wendy, Anderson, David, Basarich, Joel, Booth, Fred, Dudka, Lee, Gonella, Joe, Goodman, Nancy, Kearns, George, Nightenhelser, Keith, Reid, Richard, Rosenbaum, Andy, Smith, Tom, Wilhelm, James, Odlin, Reno, Heckford, H.J., Moody, A.D.
Publikováno v:
Paideuma, 1979 Apr 01. 8(1), 173-178.
Externí odkaz:
https://www.jstor.org/stable/24724873