DNN-based semantic rescoring models for speech recognition

Autor:	Irina Illina, Dominique Fohr
Přispěvatelé:	Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), The authors thank the DGA (Direction Générale de l’Armement, part of the French Ministry of Defence), Thales AVS and Dassault Aviation who are supporting the funding of this study and the 'Man-Machine Teaming' scientific program., MMT, Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	Artificial neural network Computer science Speech recognition Automatic speech recognition Word error rate 020206 networking & telecommunications Context (language use) 02 engineering and technology Semantic data model Semantics 01 natural sciences 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Word2vec Language model [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] 010301 acoustics semantics embeddings Word (computer architecture) BERT
Zdroj:	TSD 2021-24th International Conference on Text, Speech and Dialogue TSD 2021-24th International Conference on Text, Speech and Dialogue, Sep 2021, Olomouc, Czech Republic Text, Speech, and Dialogue ISBN: 9783030835262 TDS
Popis:	International audience; In this work, we address the problem of improving an automatic speech recognition (ASR) system. We want to efficiently model long-term semantic relations between words and introduce this information through a semantic model. We propose neural network (NN) semantic models for rescoring the N-best hypothesis list. These models use two types of representations as part of DNN input features: static word embeddings (from word2vec) and dynamic contextual embeddings (from BERT). Semantic information is computed thanks to these representations and used in the hypothesis pair comparison mode. We perform experiments on the publicly available dataset TED-LIUM. Clean speech and speech mixed with real noise are experimented, according to our industrial project context. The proposed BERT-based rescoring approach gives a significant improvement of the word error rate (WER) over the ASR system without rescoring semantic models under all experimented conditions and with n-gram and recurrent NN language model (Long Short-Term model, LSTM).
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::02764de7ddbbf8b1e70a39a6a7e69fab https://hal.archives-ouvertes.fr/hal-03239211/document Zobrazit plný text záznamu