LIFA: Language identification from audio with LPCC-G features.

Autor: Mukherjee, Himadri, Dhar, Ankita, Obaidullah, Sk Md, Santosh, KC, Phadikar, Santanu, Roy, Kaushik, Pal, Umapada
Předmět:
Zdroj: Multimedia Tools & Applications; Jun2024, Vol. 83 Issue 19, p56883-56907, 25p
Abstrakt: In Western countries, speech recognition-based technologies have significantly developed compared to the countries of the South Asian subcontinent like India. India is a multilingual country (22 scheduled languages) with over 1.3 Billion population of which a major percentage faces difficulty with the user interface of different technological advancements and therefore speech recognition tools are very useful. In this paper, we propose LIFA: Language Identification From Audio - a fully automated tool that can identify the spoken language (phrases/words) and invoke the language-specific recognition engine. Experiments were performed on more than 2200 hours of data from the top-11 spoken languages in India. The clips were parameterized with a novel linear predictive cepstral coefficient (LPCC)-based features, which we call LPCC-Grade (LPCC-G). The proposed feature is capable of focusing on the distribution of energy across different frequency ranges in an audio clip for better classification while avoiding high dimensionality issues. Using a random forest-based classifier, we achieved the highest accuracy of 99.01%. Further, we tested the robustness of the system with different noisy scenarios on multiple datasets wherein accuracies in the range of 79%-98% were obtained. We also studied other popular existing features in our comparison where accuracies of 96.37 % and 92.48 % were obtained for LSF and MFCC-based features. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index