Outdoor Acoustic Event Identification with DNN Using a Quadrotor-Embedded Microphone Array

Autor:	Akihide Nagamine, Keisuke Nakamura, Osamu Sugiyama, Satoshi Uemura, Kazuhiro Nakadai, Ryosuke Kojima
Rok vydání:	2017
Předmět:	0209 industrial biotechnology Engineering Microphone array General Computer Science Event (computing) business.industry Speech recognition 02 engineering and technology 030507 speech-language pathology & audiology 03 medical and health sciences Identification (information) 020901 industrial engineering & automation Electrical and Electronic Engineering 0305 other medical science business
Zdroj:	Journal of Robotics and Mechatronics. 29:188-197
ISSN:	1883-8049 0915-3942
Popis:	[abstFig src='/00290001/18.jpg' width='275' text='Software architecture for OCASA with proposed AEI' ] This paper addressesAcoustic Event Identification (AEI)of acoustic signals observed with a microphone array embedded in a quadrotor that is flying in a noisy outdoor environment. In such an environment, noise generated by rotors, wind, and other sound sources is a big problem. To solve this, we propose the use of a combination of two approaches that have recently been introduced:Sound Source Separation (SSS)andSound Source Identification (SSI). SSS improves theSignal-to-Noise Ratio (SNR)of the input sound, and SSI is then performed on the SNR-improved sound. Two SSS methods are investigated. One is a single channel algorithm,Robust Principal Component Analysis (RPCA), and the other isGeometric High-order Decorrelation-based Source Separation (GHDSS-AS), known as a multichannel method. For SSI, we investigate two types of deep neural networks namelyStacked denoising Autoencoder (SdA)andConvolutional Neural Network (CNN), which have been extensively studied as highly-performant approaches in the fields of automatic speech recognition and visual object recognition. Preliminary experiments have showed the effectiveness of the proposed approaches, a combination of GHDSS-AS and CNN in particular. This combination correctly identified over 80% of sounds in an 8-class sound classification recorded by a hovering quadrotor. In addition, the CNN identifier that was implemented could be handled even with a low-end CPU by measuring the prediction time.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6793c41b026a3252bbad920bdcb3a649 https://doi.org/10.20965/jrm.2017.p0188 Zobrazit plný text záznamu