Relevance Between Dataset Composition and Test Accuracy Based on Deep Learning Algorithm in Voice Detection

Autor: Ren-Jie Liu, 劉人傑
Rok vydání: 2018
Druh dokumentu: 學位論文 ; thesis
Popis: 106
Many music processing systems usually need to know the locations of vocal segments in a song to be the first step for further processing or analysis. In this thesis, we investigated the influences of datasets on vocal detection accuracy. To this end, we firstly collected many genres of music, and manually annotated their ground-truth labels as vocal or nonvcal clips. Next, we applied various deep learning architectures to train and predict music clips in the datasets, and observe the experimental results. To understand the impact of different training datasets on the test accuracy, we paired various training and testing datasets from different sources, such as Jamendo and free music archive (FMA). In addition, we also built an in-house testing dataset consisting of audio clips wrongly predicted by various deep learning models, and this dataset was a more difficult and discriminant test dataset for comparing the performance for various deep learning algorithms. Our experimental results showed that even with a small amount of training data (less than 1k training samples), the deep neural network models could still learn the simple underlying patterns of music, and achieving accuracy of about 70%. When the number of training samples reached a certain level, it was difficult, if not totally useless, to increase the accuracy by using data augmentation methods or using energy normalization. Even increasing the “real” training samples had only marginal improvement. In conclusion, in order to effectively and efficiently train a deep neural network with satisfactory accuracy, it is necessary to have a sufficient understanding of the content of the audio clips to be detected, and then accordingly to prepare and construct a corresponding training dataset.
Databáze: Networked Digital Library of Theses & Dissertations