Vision Transformer for Music Genre Classification using Mel-frequency Cepstrum Coefficient
Autor: | Yash Khasgiwala, Jash Tailor |
---|---|
Rok vydání: | 2021 |
Předmět: |
business.industry
Computer science Speech recognition Feature extraction ComputingMilieux_PERSONALCOMPUTING Task (project management) ComputingMethodologies_PATTERNRECOGNITION Leverage (statistics) The Internet Mel-frequency cepstrum Music industry Architecture business Transformer (machine learning model) |
Zdroj: | 2021 IEEE 4th International Conference on Computing, Power and Communication Technologies (GUCON). |
DOI: | 10.1109/gucon50781.2021.9573568 |
Popis: | The music industry has gone through some seismic changes since the dawn of the Internet. It has been reaching a much larger audience because the appetite for different music styles has increased. New songs are written and released every day, which makes genre classification a tiresome and lengthy task. A good music application should be able to recommend songs according to the user's preferred genre and identify the genre of the songs with precision. In a successful attempt towards the same, we leverage a transformer-based model to outperform the frequently used CNN model. In this research, acoustic features (MFCC) are extracted from the audio files from the FMA dataset for genre classification purposes. The songs represented by MFCC are trained and evaluated on the novel Vision Transformer, RNN-LSTM, and CNN-based architecture. |
Databáze: | OpenAIRE |
Externí odkaz: |