BERT-based Semantic Model for Rescoring N-best Speech Recognition List
Autor: | Irina Illina, Dominique Fohr |
---|---|
Přispěvatelé: | Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), The authors thank the DGA (Direction Générale de l’Armement, part of the French Ministry of Defence), Thales AVS and Dassault Aviation who are supporting the funding of this study and the 'Man-Machine Teaming' scientific program in which this project is taking place., Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Artificial neural network
Computer science Speech recognition Automatic speech recognition Word error rate 020206 networking & telecommunications 02 engineering and technology Semantic data model Semantic consistency 030507 speech-language pathology & audiology 03 medical and health sciences Noise semantic context 0202 electrical engineering electronic engineering information engineering Semantic context Word2vec [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] 0305 other medical science embeddings Word (computer architecture) BERT |
Zdroj: | INTERSPEECH 2021 INTERSPEECH 2021, Aug 2021, Brno, Czech Republic INTERSPEECH 2021, Aug 2021, Brno, Czech Republic. ⟨10.21437/Interspeech.2021-313⟩ |
DOI: | 10.21437/Interspeech.2021-313⟩ |
Popis: | International audience; This work aims to improve automatic speech recognition (ASR) by modeling long-term semantic relations. We propose to perform this through rescoring the ASR N-best hypotheses list. To achieve this, we propose two deep neural network (DNN) models and combine semantic, acoustic, and linguistic information. Our DNN rescoring models are aimed at selecting hypotheses that have better semantic consistency and therefore lower WER. We investigate a powerful representation as part of input features to our DNN model: dynamic contextual embeddings from Transformer-based BERT. Acoustic and linguistic features are also included. We perform experiments on the publicly available dataset TED-LIUM. We evaluate in clean and in noisy conditions, with n-gram and Recurrent Neural Network Language Model (RNNLM), more precisely Long Short-Term Memory (LSTM) model. The proposed rescoring approaches give significant WER improvements over the ASR system without rescoring models. Furthermore, the combination of rescoring methods based on BERT and GPT-2 scores achieves the best results. |
Databáze: | OpenAIRE |
Externí odkaz: |