Deformable CNN and Imbalance-Aware Feature Learning for Singing Technique Classification

Autor:	Yamamoto, Yuya, Nam, Juhan, Terasawa, Hiroko
Rok vydání:	2022
Předmět:	Computer Science - Sound Computer Science - Multimedia Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	Singing techniques are used for expressive vocal performances by employing temporal fluctuations of the timbre, the pitch, and other components of the voice. Their classification is a challenging task, because of mainly two factors: 1) the fluctuations in singing techniques have a wide variety and are affected by many factors and 2) existing datasets are imbalanced. To deal with these problems, we developed a novel audio feature learning method based on deformable convolution with decoupled training of the feature extractor and the classifier using a class-weighted loss function. The experimental results show the following: 1) the deformable convolution improves the classification results, particularly when it is applied to the last two convolutional layers, and 2) both re-training the classifier and weighting the cross-entropy loss function by a smoothed inverse frequency enhance the classification performance. Comment: Accepted to INTERSPEECH2022
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2206.12230 Zobrazit plný text záznamu View this record from Arxiv