An ensemble model of CNN with Bi-LSTM for automatic singer identification.

Autor: Jitendra, Mukkamala S. N. V., Radhika, Y.
Zdroj: Multimedia Tools & Applications; Oct2023, Vol. 82 Issue 25, p38853-38874, 22p
Abstrakt: In the present-day scenario, gender detection has become significant in content-based multimedia systems. An automated mechanism for gender identification is mainly in demand to process the massive data. Singer identification is a popular topic in music information recommender systems that includes identifying the singer from the song based on the singer's voice and other background key features like timbre and pitch. Many models like GMM, SVM, and MLP are broadly used for classification and singer identification. Moreover, most current models have limitations where vocals and instrumental music are separated manually, and only vocals are used to build and train the model. To deal with unstructured data like music, the deep learning techniques are very suitable and have exhibited exemplary performance in similar studies. In acoustic modeling, the Deep Neural Networks (DNN) models like convolutional neural networks (CNN) have played a promising role in classifying unstructured and poorly labeled data. In the current study, an ensemble model, a combination of a CNN model with bi-directional LSTM, is considered for singer identification from the spectrogram images generated from the audio clip. CNN models are proven to better handle variable-length input data by identifying the features. Bi-LSTM will yield better accuracy by remembering the essential features over time and addressing temporal contextual information. The experimentation is performed on the Indian songs and MIR-1 k data set, and it is observed that the proposed model has outperformed with a prediction accuracy of 97.4%. The performance of the proposed model is being compared against the existing models in the current study. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index