Abstrakt: |
In today's digital world, natural language is used to exchange information among humans, and it has now advanced to the point of being an evolution criteria for technology. The process of determining which language a speaker is speaking is known as spoken language identification, and it is used for front-end processing in human-computer interaction. In this study, we developed a Language Identification model for Ethio-Semitic languages because Language Identification is an intermediate task for other Natural Language Processing tasks such as speech to text translation, speech to speech translation, speech recognition, and speech information retrieval. We used Convolutional Neural Network with respect to different acoustic features such as Mel-frequency Cepstral Coefficients, mel-spectrogram and combined (Mel-frequency Cepstral Coefficients + mel-spectrogram) features to emphasize critical features for uncomplicated output identification. The study's primary goal was to identify specific languages such as Amharic, Geez, Guragigna, and Tigrigna. Based on this, the results show that Convolutional Neural Network with augmented data and hybrid features performed better than using Mel-frequency Cepstral Coefficients or Mel-spectrogram features. The proposed model achieved an average performance accuracy of 97%, 97.4% and 99.5% for testing, validation, and training respectively. We consequently reached the conclusion that the combined (Mel- Spectrogram + Mel-frequency Cepstral Coefficients) feature was the most crucial feature. [ABSTRACT FROM AUTHOR] |