Speech recognition in adverse conditions by humans and machines.
Autor: | Patman C; Theoretical and Applied Linguistics Section, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Sidgwick Avenue, Cambridge CB3 9DA, United Kingdom., Chodroff E; Department of Computational Linguistics, University of Zurich, Andreasstrasse 15, Zurich 8050, Switzerlandcep72@cam.ac.uk, eleanor.chodroff@uzh.ch. |
---|---|
Jazyk: | angličtina |
Zdroj: | JASA express letters [JASA Express Lett] 2024 Nov 01; Vol. 4 (11). |
DOI: | 10.1121/10.0032473 |
Abstrakt: | In the development of automatic speech recognition systems, achieving human-like performance has been a long-held goal. Recent releases of large spoken language models have claimed to achieve such performance, although direct comparison to humans has been severely limited. The present study tested L1 British English listeners against two automatic speech recognition systems (wav2vec 2.0 and Whisper, base and large sizes) in adverse listening conditions: speech-shaped noise and pub noise, at different signal-to-noise ratios, and recordings produced with or without face masks. Humans maintained the advantage against all systems, except for Whisper large, which outperformed humans in every condition but pub noise. (© 2024 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).) |
Databáze: | MEDLINE |
Externí odkaz: |