A Two-Stage Approach to Device-Robust Acoustic Scene Classification

Autor:	Juanjuan Li, Yuanjun Zhao, Yannan Wang, Hu Hu, Hongning Zhu, Feng Bao, Shu-Tong Niu, Li Chai, Sabato Marco Siniscalchi, Chin-Hui Lee, Yajian Wang, Xin Tang, Xianjun Xia, Jun Du, Chao-Han Huck Yang, Xue Bai
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Sound (cs.SD) Computer Science - Machine Learning Signal processing Computer Science - Artificial Intelligence business.industry Computer science Class activation mapping Computer Science - Neural and Evolutionary Computing Pattern recognition Convolutional neural network Computer Science - Sound Machine Learning (cs.LG) Data modeling Artificial Intelligence (cs.AI) Audio and Speech Processing (eess.AS) Robustness (computer science) FOS: Electrical engineering electronic engineering information engineering Neural and Evolutionary Computing (cs.NE) Artificial intelligence business Electrical Engineering and Systems Science - Audio and Speech Processing Test data
Zdroj:	ICASSP
DOI:	10.1109/icassp39728.2021.9414835
Popis:	To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our two-stage system leverages on an ad-hoc score combination based on two CNN classifiers: (i) the first CNN classifies acoustic inputs into one of three broad classes, and (ii) the second CNN classifies the same inputs into one of ten finer-grained classes. Three different CNN architectures are explored to implement the two-stage classifiers, and a frequency sub-sampling scheme is investigated. Moreover, novel data augmentation schemes for ASC are also investigated. Evaluated on DCASE 2020 Task 1a, our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set, where our best system, a two-stage fusion of CNN ensembles, delivers a 81.9% average accuracy among multi-device test data, and it obtains a significant improvement on unseen devices. Finally, neural saliency analysis with class activation mapping (CAM) gives new insights on the patterns learnt by our models. Submitted to ICASSP 2021. Code available: https://github.com/MihawkHu/DCASE2020_task1
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4e6f2ee73dd624ce7691c8ef7f24e48f https://doi.org/10.1109/icassp39728.2021.9414835 Zobrazit plný text záznamu