Vision Transformer for Music Genre Classification using Mel-frequency Cepstrum Coefficient

Autor: Yash Khasgiwala, Jash Tailor
Rok vydání: 2021
Předmět:
Zdroj: 2021 IEEE 4th International Conference on Computing, Power and Communication Technologies (GUCON).
DOI: 10.1109/gucon50781.2021.9573568
Popis: The music industry has gone through some seismic changes since the dawn of the Internet. It has been reaching a much larger audience because the appetite for different music styles has increased. New songs are written and released every day, which makes genre classification a tiresome and lengthy task. A good music application should be able to recommend songs according to the user's preferred genre and identify the genre of the songs with precision. In a successful attempt towards the same, we leverage a transformer-based model to outperform the frequently used CNN model. In this research, acoustic features (MFCC) are extracted from the audio files from the FMA dataset for genre classification purposes. The songs represented by MFCC are trained and evaluated on the novel Vision Transformer, RNN-LSTM, and CNN-based architecture.
Databáze: OpenAIRE