El Volumen Louder Por Favor: Code-switching in Task-oriented Semantic Parsing

Autor:	Sonal Gupta, Lorena Sainz-Maza Lecanda, Abhinav Arora, Arash Einolghozati, Anuj Kumar
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Hindi Computer Science - Machine Learning Computer Science - Computation and Language Parsing Computer Science - Artificial Intelligence Computer science business.industry computer.software_genre Code-switching language.human_language Machine Learning (cs.LG) Focus (linguistics) Zero (linguistics) Artificial Intelligence (cs.AI) language Generalizability theory Language model Artificial intelligence business Computation and Language (cs.CL) computer Spanglish Natural language processing
Zdroj:	EACL
Popis:	Being able to parse code-switched (CS) utterances, such as Spanish+English or Hindi+English, is essential to democratize task-oriented semantic parsing systems for certain locales. In this work, we focus on Spanglish (Spanish+English) and release a dataset, CSTOP, containing 5800 CS utterances alongside their semantic parses. We examine the CS generalizability of various Cross-lingual (XL) models and exhibit the advantage of pre-trained XL language models when data for only one language is present. As such, we focus on improving the pre-trained models for the case when only English corpus alongside either zero or a few CS training instances are available. We propose two data augmentation methods for the zero-shot and the few-shot settings: fine-tune using translate-and-align and augment using a generation model followed by match-and-filter. Combining the few-shot setting with the above improvements decreases the initial 30-point accuracy gap between the zero-shot and the full-data settings by two thirds.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f1420390a587eeffe8c32099b3a695e4 https://doi.org/10.18653/v1/2021.eacl-main.87 Zobrazit plný text záznamu