Polyphonic training set synthesis improves self-supervised urban sound classification

Autor:	Félix Gontier, Vincent Lostanlen, Jean-François Petiot, Nicolas Fortin, Catherine Lavandier, Mathieu Lagrange
Přispěvatelé:	Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Université Gustave Eiffel, Université de Cergy Pontoise (UCP), Université Paris-Seine, ANR-16-CE22-0012,CENSE,Caractérisation des environnements sonores urbains : vers une approche globale associant données libres, mesures et modélisations(2016), Lagrange, Mathieu, Caractérisation des environnements sonores urbains : vers une approche globale associant données libres, mesures et modélisations - - CENSE2016 - ANR-16-CE22-0012 - AAPG2016 - VALID
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] Acoustics and Ultrasonics Computer science [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing Inpainting 02 engineering and technology Machine learning computer.software_genre 01 natural sciences Task (project management) [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] Arts and Humanities (miscellaneous) [STAT.ML]Statistics [stat]/Machine Learning [stat.ML] [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing 0103 physical sciences 0202 electrical engineering electronic engineering information engineering 010301 acoustics [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing Machine listening Ground truth [INFO.INFO-MM] Computer Science [cs]/Multimedia [cs.MM] Data curation business.industry [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] 020206 networking & telecommunications Acoustics [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] [STAT.ML] Statistics [stat]/Machine Learning [stat.ML] Conjunction (grammar) Sound Face (geometry) Spectrogram Artificial intelligence business computer [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
Zdroj:	Journal of the Acoustical Society of America Journal of the Acoustical Society of America, Acoustical Society of America, 2021
ISSN:	0001-4966 1520-8524
Popis:	International audience; Machine listening systems for environmental acoustic monitoring face a shortage of expert annotations to be used as training data. To circumvent this issue, the emerging paradigm of self-supervised learning proposes to pre-train audio classifiers on a task whose ground truth is trivially available. Alternatively, training set synthesis consists in annotating a small corpus of acoustic events of interest which are then automatically mixed at random to form a larger corpus of polyphonic scenes. Prior studies have considered these two paradigms in isolation, but rarely ever in conjunction. Furthermore, the impact of data curation in training set synthesis remains unclear. To fill this gap in research, this article proposes a two-stage approach. In the self-supervised stage, we formulate a pretext task (Audio2Vec skip-gram inpainting) on unlabeled spectrograms from an acoustic sensor network. Then, in the supervised stage, we formulate a downstream task of multilabel urban sound classification on synthetic scenes. We find that training set synthesis benefits more to overall performance than self-supervised learning. Interestingly, the geographical origin of the acoustic events in training set synthesis appears to have a decisive impact.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0bdb54a3eeb5369c375d8af5b4d1e053 https://hal.archives-ouvertes.fr/hal-03262863 Zobrazit plný text záznamu