Acoustic Event Detection with MobileNet and 1D-Convolutional Neural Network
Autor: | Chin Poo Lee, Kian Ming Lim, Pooi Shiang Tan, Cheah Heng Tan |
---|---|
Rok vydání: | 2020 |
Předmět: |
0209 industrial biotechnology
Computer science Event (computing) business.industry Deep learning Pattern recognition 02 engineering and technology Overfitting Convolutional neural network Convolution 020901 industrial engineering & automation Feature (computer vision) 0202 electrical engineering electronic engineering information engineering Spectrogram 020201 artificial intelligence & image processing Artificial intelligence business Energy (signal processing) Dropout (neural networks) Sound wave |
Zdroj: | IICAIET |
DOI: | 10.1109/iicaiet49801.2020.9257865 |
Popis: | Sound waves are a form of energy produced by a vibrating object that travels through the medium that can be heard. Generally, the sound is used in human communication, music, alert, and so on. Furthermore, it also helps us to understand what are the events that occurring in the moment, and thereby, provide us hints to understand what is happening around us. This has prompt researchers to study on how humans understand what event is occurring based on the sound waves. In recent years, researchers also study on how to equip the machine with this ability, i.e. acoustic event detection. This study focuses on the acoustic event detection which leverage both frequency spectrogram technique and deep learning methods. Initially, a spectrogram image is generated from the acoustic data by using the frequency spectrogram technique. Then, the generated frequency spectrogram is fed into a pre-trained MobileNet model to extract robust features representations. In this work, 1 Dimensional Convolutional Neural Network (1D-CNN) is adopted to train a model for acoustic event detection. The feature representations are extracted from a pre-trained MobileNet. The proposed 1D-CNN consist of several alternatives of convolution and pooling layers. The last pooling layer is flattened and fed into a fully connected layer to classify the events. Dropout is employed to prevent overfitting. The proposed frequency spectrogram with pre-trained MobileNet and 1D-CNN is then evaluated with three datasets, which are Soundscapes1, Soundscapes2, and UrbanSound8k. From the experimental results, the proposed method obtained 81, 86, and 70 F1-score, for Soundscapes1, Soundscapes2, and UrbanSound8k, respectively. |
Databáze: | OpenAIRE |
Externí odkaz: |