Hyperkinetic Dysarthria voice abnormalities: a neural network solution for text translation.

Autor: Hashan, Antor Mahamudul, Dmitrievich, Chaganov Roman, Valerievich, Melnikov Alexander, Vasilyevich, Dorokh Danila, Alexandrovich, Khlebnikov Nikolai, Bredikhin, Boris Andreevich
Předmět:
Zdroj: International Journal of Speech Technology; Mar2024, Vol. 27 Issue 1, p255-265, 11p
Abstrakt: The implementation of a defect speech recognition (DSR) system has the opportunity to significantly improve the lifestyle of people with speech disorders. In this paper, we developed a novel ConvGRUSpeechNet model for recognizing and understanding hyperkinetic dysarthria disorder (HDD) speech. The proposed model uniquely combines convolutional layers, recurrent layers (GRU and BiGRU), and dense layers with a LogSoftmax function to effectively recognize and translate HDD speech into text. To prevent overfitting and handling imbalances, we employed data augmentation and splitting functions during the training process. Also, the Mel-frequency cepstral coefficients (MFCC) were employed to reduce the issue of vanishing gradients. In addition, a dataset of Russian speech has been created, comprising 2000 recordings of HDD speech. The primary objective of this research is to improve speech recognition for individuals with HDD by employing the ConvGRUSpeechNet model. The proposed DSR system outperformed the recognition character error rate (CER) of 12.35% using the test dataset. Under the same conditions, the experimental findings show that the proposed solution exhibits superior performance in comparison to existing state-of-the-art CBNs and TDNN-F LF-MMI models. Furthermore, we implemented the TensorFlow model on a flask server, making it accessible for use in a web application. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index