Large-Scale Self- and Semi-Supervised Learning for Speech Translation
Autor: | Michael Auli, Changhan Wang, Alexei Baevski, Juan Pino, Anne Wu, Alexis Conneau |
---|---|
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Computation and Language Scale (ratio) Computer science business.industry Semi-supervised learning computer.software_genre Speech translation Code (cryptography) Leverage (statistics) State (computer science) Language model Artificial intelligence business computer Computation and Language (cs.CL) Decoding methods Natural language processing |
DOI: | 10.48550/arxiv.2104.06678 |
Popis: | In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways. We explore both pretraining and self-training by using the large Libri-Light speech audio corpus and language modeling with CommonCrawl. Our experiments improve over the previous state of the art by 2.6 BLEU on average on all four considered CoVoST 2 language pairs via a simple recipe of combining wav2vec 2.0 pretraining, a single iteration of self-training and decoding with a language model. Different to existing work, our approach does not leverage any other supervision than ST data. Code and models will be publicly released. |
Databáze: | OpenAIRE |
Externí odkaz: |