Automatic human utility evaluation of ASR systems: does WER really predict performance?

Autor: Cosmin Munteanu, Adam Lee, Ani Nenkova, Stephen Tratz, Kyla Cheung, Siavash Kazemian, Clare R. Voss, Benoit Favre, Dennis Ochei, Yang Liu, Frauke Zeller, Gerald Penn
Přispěvatelé: Laboratoire d'informatique Fondamentale de Marseille - UMR 6166 (LIF), Université de la Méditerranée - Aix-Marseille 2-Université de Provence - Aix-Marseille 1-Centre National de la Recherche Scientifique (CNRS), Columbia University [New York], University of Toronto, Department of Chemistry [York, UK], University of York [York, UK], City University of New York [New York] (CUNY), Department of Computer Science [Dallas] (University of Texas at Dallas), University of Texas at Dallas [Richardson] (UT Dallas), University of Pennsylvania [Philadelphia], US Army Research Laboratory-CIS Directorate (ARL), United States Army (U.S. Army), University College of London [London] (UCL), Traitement Automatique du Langage Ecrit et Parlé (TALEP), Laboratoire d'Informatique et Systèmes (LIS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), University of Pennsylvania, Favre, Benoit
Rok vydání: 2013
Předmět:
Zdroj: INTERSPEECH
Interspeech, Lyon (France)
Interspeech, Lyon (France), 2013, Unknown, Unknown Region
HAL
Popis: We propose an alternative evaluation metric to Word Error Rate (WER) for the decision audit task of meeting recordings, which exemplifies how to evaluate speech recognition within a legitimate application context. Using machine learning on an initial seed of human-subject experimental data, our alternative metric handily outperforms WER, which correlates very poorly with human subjects’ success in finding decisions given ASR transcripts with a range of WERs.
Databáze: OpenAIRE