Self-supervised learning based knowledge distillation framework for automatic speech recognition for hearing impaired

Autor: L. Ashok Kumar, D. Karthika Renuka, Priya M C Shunmuga, G Madhumitha, S Priyanka, M Sangeeth, R Subhiksha
Rok vydání: 2022
Předmět:
Zdroj: International journal of health sciences.
ISSN: 2550-696X
2550-6978
DOI: 10.53730/ijhs.v6ns1.7865
Popis: The use of speech processing applications, particularly speech recognition, has got a lot of attention in recent decades. In recent years, research has focused on using deep learning for speech-related applications. This new branch of machine learning has outperformed others in a range of applications, including voice, and has thus become a particularly appealing research subject. Noise, speaker variability, language variability, vocabulary size, and domain remain one of the most significant research difficulties in speech recognition. We investigated on self-supervised algorithm for the unlabelled data. In recent years, these algorithms have progressed significantly, with their efficacy approaching and supervised pre-training alternatives across a variety of data modalities such as image and video. The purpose of this research is to develop powerful models for audio speech recognition that do not require human annotation. We accomplish this by distilling information from an automatic speech recognition (ASR) model that was trained on a large audio-only corpus. We integrate Connectionist Temporal Classification (CTC) loss, KL divergence loss in distillation technique. We demonstrate that distillation significantly speeds up training. We evaluate our model with evaluation metric Word Error Rate (WER).
Databáze: OpenAIRE