Word Confidence Estimation For Speech Translation
Autor: | Besacier, Laurent, Lecouteux, Benjamin, Luong, Ngoc-Quang, Hour, K, Hadj Salah, Marwa |
---|---|
Přispěvatelé: | Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Laboratoire d'Informatique de Grenoble (LIG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF), Université Grenoble Alpes [2016-2019] (UGA [2016-2019]) |
Jazyk: | angličtina |
Rok vydání: | 2014 |
Předmět: | |
Zdroj: | IWSLT IWSLT, 2014, Lake Tahoe, United States International Workshop on Spoken Language Translation International Workshop on Spoken Language Translation, Dec 2014, Lake Tahoe, United States |
Popis: | International audience; Word Confidence Estimation (WCE) for machine transla-tion (MT) or automatic speech recognition (ASR) consists in judging each word in the (MT or ASR) hypothesis as correct or incorrect by tagging it with an appropriate label. In the past, this task has been treated separately in ASR or MT con-texts and we propose here a joint estimation of word confi-dence for a spoken language translation (SLT) task involving both ASR and MT. This research work is possible because we built a specific corpus which is first presented. This cor-pus contains 2643 speech utterances for which a quintuplet containing: ASR output (src-asr), verbatim transcript (src-ref), text translation output (tgt-mt), speech translation out-put (tgt-slt) and post-edition of translation (tgt-pe), is made available. The rest of the paper illustrates how such a corpus (made available to the research community) can be used for evaluating word confidence estimators in ASR, MT or SLT scenarios. WCE for SLT could help rescoring SLT output graphs, improving translators productivity (for translation of lectures or movie subtitling) or it could be useful in interac-tive speech-to-speech translation scenarios. Word confidence estimation (WCE), Spoken Language Translation (SLT), Corpus, Joint features. |
Databáze: | OpenAIRE |
Externí odkaz: |