Automatic human utility evaluation of ASR systems: does WER really predict performance?
Autor: | Cosmin Munteanu, Adam Lee, Ani Nenkova, Stephen Tratz, Kyla Cheung, Siavash Kazemian, Clare R. Voss, Benoit Favre, Dennis Ochei, Yang Liu, Frauke Zeller, Gerald Penn |
---|---|
Přispěvatelé: | Laboratoire d'informatique Fondamentale de Marseille - UMR 6166 (LIF), Université de la Méditerranée - Aix-Marseille 2-Université de Provence - Aix-Marseille 1-Centre National de la Recherche Scientifique (CNRS), Columbia University [New York], University of Toronto, Department of Chemistry [York, UK], University of York [York, UK], City University of New York [New York] (CUNY), Department of Computer Science [Dallas] (University of Texas at Dallas), University of Texas at Dallas [Richardson] (UT Dallas), University of Pennsylvania [Philadelphia], US Army Research Laboratory-CIS Directorate (ARL), United States Army (U.S. Army), University College of London [London] (UCL), Traitement Automatique du Langage Ecrit et Parlé (TALEP), Laboratoire d'Informatique et Systèmes (LIS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), University of Pennsylvania, Favre, Benoit |
Rok vydání: | 2013 |
Předmět: |
Computer science
Speech recognition Word error rate 020206 networking & telecommunications 02 engineering and technology [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Task (project management) 030507 speech-language pathology & audiology 03 medical and health sciences [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] 0202 electrical engineering electronic engineering information engineering Metric (unit) 0305 other medical science ComputingMilieux_MISCELLANEOUS |
Zdroj: | INTERSPEECH Interspeech, Lyon (France) Interspeech, Lyon (France), 2013, Unknown, Unknown Region HAL |
Popis: | We propose an alternative evaluation metric to Word Error Rate (WER) for the decision audit task of meeting recordings, which exemplifies how to evaluate speech recognition within a legitimate application context. Using machine learning on an initial seed of human-subject experimental data, our alternative metric handily outperforms WER, which correlates very poorly with human subjects’ success in finding decisions given ASR transcripts with a range of WERs. |
Databáze: | OpenAIRE |
Externí odkaz: |