Investigating Self-supervised Pre-training for End-to-end Speech Translation

Autor: Fethi Bougares, Natalia A. Tomashenko, Ha Nguyen, Yannick Estève, Laurent Besacier
Přispěvatelé: Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Laboratoire d'Informatique de l'Université du Mans (LIUM), Le Mans Université (UM), Laboratoire Informatique d'Avignon (LIA), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), Institut Universitaire de France (IUF), Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.), ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019)
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: Interspeech 2020
Interspeech 2020, Oct 2020, Shangai (Virtual Conf), China
INTERSPEECH
Popis: International audience; Self-supervised learning from raw speech has been proven beneficial to improve automatic speech recognition (ASR). We investigate here its impact on end-to-end automatic speech translation (AST) performance. We use a contrastive predic-tive coding (CPC) model pre-trained from unlabeled speech as a feature extractor for a downstream AST task. We show that self-supervised pre-training is particularly efficient in low resource settings and that fine-tuning CPC models on the AST training data further improves performance. Even in higher resource settings, ensembling AST models trained with filter-bank and CPC representations leads to near state-of-the-art models without using any ASR pre-training. This might be particularly beneficial when one needs to develop a system that translates from speech in a language with poorly standardized orthography or even from speech in an unwritten language. Index Terms: self-supervised learning from speech, automatic speech translation, end-to-end models, low resource settings.
Databáze: OpenAIRE