Autor: |
Mukherjee, Himadri, Dhar, Ankita, Obaidullah, Sk Md, Santosh, KC, Phadikar, Santanu, Roy, Kaushik, Pal, Umapada |
Předmět: |
|
Zdroj: |
Multimedia Tools & Applications; Jun2024, Vol. 83 Issue 19, p56883-56907, 25p |
Abstrakt: |
In Western countries, speech recognition-based technologies have significantly developed compared to the countries of the South Asian subcontinent like India. India is a multilingual country (22 scheduled languages) with over 1.3 Billion population of which a major percentage faces difficulty with the user interface of different technological advancements and therefore speech recognition tools are very useful. In this paper, we propose LIFA: Language Identification From Audio - a fully automated tool that can identify the spoken language (phrases/words) and invoke the language-specific recognition engine. Experiments were performed on more than 2200 hours of data from the top-11 spoken languages in India. The clips were parameterized with a novel linear predictive cepstral coefficient (LPCC)-based features, which we call LPCC-Grade (LPCC-G). The proposed feature is capable of focusing on the distribution of energy across different frequency ranges in an audio clip for better classification while avoiding high dimensionality issues. Using a random forest-based classifier, we achieved the highest accuracy of 99.01%. Further, we tested the robustness of the system with different noisy scenarios on multiple datasets wherein accuracies in the range of 79%-98% were obtained. We also studied other popular existing features in our comparison where accuracies of 96.37 % and 92.48 % were obtained for LSF and MFCC-based features. [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|