Sound Classification and Detection using Deep Learning

Autor: Dang Thi Thuy An, 鄧氏陲殷
Rok vydání: 2017
Druh dokumentu: 學位論文 ; thesis
Popis: 105
In this work, we develop various deep learning models to perform the acoustic scene classification (ASC) and sound event detection (SED) in real life environments. In particular, we take advantages of both convolution neural networks (CNN) and recurrent neural networks (RNN) for audio signal processing, our proposed models are constructed from these two networks. CNNs provide an effective way to capture spatial information of multidimensional data, while RNNs are powerful in learning temporal sequential data. We conduct experiments on three development datasets from the DCASE 2017 challenge including acoustic scene dataset, rare sound event dataset, and polyphonic sound event dataset. In order to reduce overfitting problem as the data is limited, we employ some data augmentation techniques such as interrupting input values to zeros with a given probability, adding Gaussian noise, and changing sound loudness. The performance of proposed methods outperforms the baselines of DCASE 2017 challenge over all three datasets. The accuracy of acoustic scene classification improves 7.2 % in comparison with the baseline. For rare sound event detection, we report an average error rate of 0.26 and F-score of 85.9% compared to 0.53 and 72.7% of baselines. For polyphonic sound event detection, our method obtains a slight improvement on an error rate of 0.59 while the baseline of 0.69.
Databáze: Networked Digital Library of Theses & Dissertations