Large-Scale Self- and Semi-Supervised Learning for Speech Translation

Autor: Michael Auli, Changhan Wang, Alexei Baevski, Juan Pino, Anne Wu, Alexis Conneau
Rok vydání: 2021
Předmět:
DOI: 10.48550/arxiv.2104.06678
Popis: In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways. We explore both pretraining and self-training by using the large Libri-Light speech audio corpus and language modeling with CommonCrawl. Our experiments improve over the previous state of the art by 2.6 BLEU on average on all four considered CoVoST 2 language pairs via a simple recipe of combining wav2vec 2.0 pretraining, a single iteration of self-training and decoding with a language model. Different to existing work, our approach does not leverage any other supervision than ST data. Code and models will be publicly released.
Databáze: OpenAIRE