End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features

Autor:	Zoltán Tüske, Brian Kingsbury, Hong-Kwang J. Kuo, Samuel Thomas, Edmilson da Silva Morais
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences Signal processing Sound (cs.SD) Computer Science - Computation and Language business.industry Computer science Speech recognition Initialization Modular design Speech processing Field (computer science) Computer Science - Sound End-to-end principle Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering business Computation and Language (cs.CL) Transformer (machine learning model) Spoken language Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	ICASSP
Popis:	Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised pre-trained acoustic features, pre-trained model initialization and multi-task training. Several SLU experiments for predicting intent and entity labels/values using the ATIS dataset are performed. These experiments investigate the interaction of pre-trained model initialization and multi-task training with either traditional filterbank or self-supervised pre-trained acoustic features. Results show not only that self-supervised pre-trained acoustic features outperform filterbank features in almost all the experiments, but also that when these features are used in combination with multi-task training, they almost eliminate the necessity of pre-trained model initialization. 5 pages, 3 tables and 1 figure
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a0e2f52c5ef493d0a61989a55a633f22 http://arxiv.org/abs/2011.08238 Zobrazit plný text záznamu