Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents
Autor: | Angeliki Metallinou, Shuyang Gao, Abhishek Sethi, Nikolaos Malandrakis, Minmin Shen, Anuj Goyal |
---|---|
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Training set Computer Science - Computation and Language Computer science business.industry Machine Learning (stat.ML) 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Bottleneck Task (project management) Variety (cybernetics) Machine Learning (cs.LG) Statistics - Machine Learning 0202 electrical engineering electronic engineering information engineering Text generation 020201 artificial intelligence & image processing Artificial intelligence business computer Computation and Language (cs.CL) 0105 earth and related environmental sciences |
Zdroj: | NGT@EMNLP-IJCNLP |
DOI: | 10.48550/arxiv.1910.03487 |
Popis: | Data availability is a bottleneck during early stages of development of new capabilities for intelligent artificial agents. We investigate the use of text generation techniques to augment the training data of a popular commercial artificial agent across categories of functionality, with the goal of faster development of new functionality. We explore a variety of encoder-decoder generative models for synthetic training data generation and propose using conditional variational auto-encoders. Our approach requires only direct optimization, works well with limited data and significantly outperforms the previous controlled text generation techniques. Further, the generated data are used as additional training samples in an extrinsic intent classification task, leading to improved performance by up to 5\% absolute f-score in low-resource cases, validating the usefulness of our approach. Comment: EMNLP WNGT workshop |
Databáze: | OpenAIRE |
Externí odkaz: |