Convolutional neural networks for the classification of guitar effects and extraction of the parameter settings of single and multi-guitar effects from instrument mixes

Autor: Reemt Hinrichs, Kevin Gerkens, Alexander Lange, Jörn Ostermann
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: EURASIP Journal on Audio, Speech, and Music Processing, Vol 2022, Iss 1, Pp 1-21 (2022)
Druh dokumentu: article
ISSN: 1687-4722
DOI: 10.1186/s13636-022-00257-4
Popis: Abstract Guitar effects are commonly used in popular music to shape the guitar sound to fit specific genres, or to create more variety within musical compositions. The sound not only is determined by the choice of the guitar effect, but also heavily depends on the parameter settings of the effect. Previous research focused on the classification of guitar effects and extraction of their parameter settings from solo guitar audio recordings. However, more realistic is the classification and extraction from instrument mixes. This work investigates the use of convolution neural networks (CNNs) for the classification and parameter extraction of guitar effects from audio samples containing guitar, bass, keyboard, and drums. The CNN was compared to baseline methods previously proposed, like support vector machines and shallow neural networks together with predesigned features. On two datasets, the CNN achieved classification accuracies $$1-5\,\%$$ 1 - 5 % above the baseline accuracy, achieving up to $$97.4\, \%$$ 97.4 % accuracy. With parameter values between 0.0 and 1.0, mean absolute parameter extraction errors of below 0.016 for the distortion, below 0.052 for the tremolo, and below 0.038 for the slapback delay effect were achieved, matching or surpassing the presumed human expert error of 0.05. The CNN approach was found to generalize to further effects, achieving mean absolute parameter extraction errors below 0.05 for the chorus, phaser, reverb, and overdrive effect. For sequentially applied combinations of distortion, tremolo, and slapback delay, the mean extraction error slightly increased from the performance for the single effects to the range of 0.05 to 0.1. The CNN was found to be moderately robust to noise and pitch changes of the background instrumentation suggesting that the CNN extracted meaningful features.
Databáze: Directory of Open Access Journals