At the Speed of Sound: Efficient Audio Scene Classification

Autor:	Latifur Khan, Cristian Lumezanu, Haifeng Chen, Yuncong Chen, Dongjin Song, Mizoguchi Takehiko, Bo Dong
Rok vydání:	2020
Předmět:	Computer science business.industry media_common.quotation_subject ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 020206 networking & telecommunications 02 engineering and technology Recurrent neural network Speed of sound 0202 electrical engineering electronic engineering information engineering Robot 020201 artificial intelligence & image processing Relevance (information retrieval) Active listening Computer vision Artificial intelligence Architecture Function (engineering) business media_common
Zdroj:	ICMR
DOI:	10.1145/3372278.3390730
Popis:	Efficient audio scene classification is essential for smart sensing platforms such as robots, medical monitoring, surveillance, or autonomous vehicles. We propose a retrieval-based scene classification architecture that combines recurrent neural networks and attention to compute embeddings for short audio segments. We train our framework using a custom audio loss function that captures both the relevance of audio segments within a scene and that of sound events within a segment. Using experiments on real audio scenes, we show that we can discriminate audio scenes with high accuracy after listening in for less than a second. This preserves 93% of the detection accuracy obtained after hearing the entire scene.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::1aecf30ed888eea0fb08e8a375b22e2a https://doi.org/10.1145/3372278.3390730 Zobrazit plný text záznamu