Residual LSTM neural network for time dependent consecutive pitch string recognition from spectrograms: a study on Turkish classical music makams.

Autor: Mirza, Fuat Kaan, Gürsoy, Ahmet Fazıl, Baykaş, Tunçer, Hekimoğlu, Mustafa, Pekcan, Önder
Zdroj: Multimedia Tools & Applications; Apr2024, Vol. 83 Issue 14, p41243-41271, 29p
Abstrakt: Turkish classical music, characterized by 'makam', specific melodic configurations delineated by sequential pitches and intervals, is rich in cultural significance and poses a considerable challenge in identifying a musical piece's particular makam. This identification complexity remains an issue even for experienced musical experts, emphasizing the need for automated and accurate classification techniques. In response, we introduce a residual LSTM neural network model that classifies makams by leveraging the distinct sequential pitch patterns discerned within various audio segments over spectrogram-based inputs. This model's design uniquely merges the spatial capabilities of two-dimensional convolutional layers with the temporal understanding of one-dimensional convolutional and LSTM mechanisms embedded within a residual framework. Such an integrated approach allows for detailed temporal analysis of shifting frequencies, as revealed in logarithmically scaled spectrograms, and is adept at recognizing consecutive pitch patterns within segments. Employing stratified cross-validation on a comprehensive dataset encompassing 1154 pieces spanning 15 unique makams, we found that our model demonstrated an accuracy of 95.60% for a subset of 9 makams and 89.09% for all 15 makams. Our approach demonstrated consistent precision even when distinguishing makam pairs known for their closely related pitch sequences. To further validate our model's prowess, we conducted benchmark tests against established methodologies found in current literature, providing a comparative assessment of our proposed workflow's abilities. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index