Self-supervised learning based knowledge distillation framework for automatic speech recognition for hearing impaired
Autor: | L. Ashok Kumar, D. Karthika Renuka, Priya M C Shunmuga, G Madhumitha, S Priyanka, M Sangeeth, R Subhiksha |
---|---|
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | International journal of health sciences. |
ISSN: | 2550-696X 2550-6978 |
DOI: | 10.53730/ijhs.v6ns1.7865 |
Popis: | The use of speech processing applications, particularly speech recognition, has got a lot of attention in recent decades. In recent years, research has focused on using deep learning for speech-related applications. This new branch of machine learning has outperformed others in a range of applications, including voice, and has thus become a particularly appealing research subject. Noise, speaker variability, language variability, vocabulary size, and domain remain one of the most significant research difficulties in speech recognition. We investigated on self-supervised algorithm for the unlabelled data. In recent years, these algorithms have progressed significantly, with their efficacy approaching and supervised pre-training alternatives across a variety of data modalities such as image and video. The purpose of this research is to develop powerful models for audio speech recognition that do not require human annotation. We accomplish this by distilling information from an automatic speech recognition (ASR) model that was trained on a large audio-only corpus. We integrate Connectionist Temporal Classification (CTC) loss, KL divergence loss in distillation technique. We demonstrate that distillation significantly speeds up training. We evaluate our model with evaluation metric Word Error Rate (WER). |
Databáze: | OpenAIRE |
Externí odkaz: |