Whisper for L2 speech scoring.

Autor: Ballier, Nicolas, Arnold, Taylor, Méli, Adrien, Thurston, Tori, Yunès, Jean-Baptiste
Zdroj: International Journal of Speech Technology; Dec2024, Vol. 27 Issue 4, p923-934, 12p
Abstrakt: In this paper, we examine whether confidence scores produced by the C++ re-implementation of Whisper (Radford et al., in: International conference on machine learning, 2023) can be used to score L2 learners of English and classify them. We test whether the language prediction and its probability can be used to classify French learners of English using a specifically collected dataset for read speech and a graded corpus, the ANGLISH corpus (Tortel and Hirst, in: Proceedings of speech prosody 2010, 2010. https://doi.org/10.21437/SpeechProsody.2010-49). We show that probability scores associated with the Whisper subtokens can be used to classify learners into levels using the knn algorithm. We show the limitations of the language detection probability beyond an initial threshold where the native language L1 of the learner can actually be predicted by the speaker. We have also used the ISLE corpus (Menzel et al., in: Proceedings of LREC 2000: Language resources and evaluation conference, European Language Resources Association, 2000) to test the prediction of the levels of Italian and German learners of English (Atwell et al., in: ICAME Jurnal, 27:5–18, 2003). We show how language detection for Whisper's multilingual larger models can be used to detect less advanced learners' first language but cannot be used for learner level classification with advanced learners. Using a greedy alignment algorithm, we also discuss the confidence score assigned to Whisper output subtokens and how this may be used for speaker scoring, prediction of learner levels, and learner feedback. We show that low confidence scores and alternative transcriptions can be used as potential cues for learner pronunciation errors. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index