Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

Autor:	Jovan Galić, Branko Marković, Đorđe Grozdić, Branislav Popović, Slavko Šajić
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	artificial neural networks audio databases automatic speech recognition convolutional neural network hidden Markov models inverse filtering Technology Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999
Zdroj:	Applied Sciences, Vol 14, Iss 18, p 8223 (2024)
Druh dokumentu:	article
ISSN:	2076-3417
DOI:	10.3390/app14188223
Popis:	Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/5fb45371a47c41fe8d1f0764bdf75b57 Zobrazit plný text záznamu View record in DOAJ