Improving Semi-Supervised Learning for Audio Classification with FixMatch

Autor:	Sascha Grollmisch, Estefanía Cano
Přispěvatelé:	Publica
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	semi-supervised learning TK7800-8360 Computer Networks and Communications Computer science Process (engineering) music information retrieval 02 engineering and technology Semi-supervised learning Machine learning computer.software_genre Convolutional neural network industrial sound analysis 020204 information systems 0202 electrical engineering electronic engineering information engineering Music information retrieval Electrical and Electronic Engineering computer.programming_language Artificial neural network business.industry Deep learning deep learning Hardware and Architecture Control and Systems Engineering Scratch Signal Processing 020201 artificial intelligence & image processing Artificial intelligence acoustic scene classification Electronics business Transfer of learning computer
Zdroj:	Electronics Volume 10 Issue 15 Electronics, Vol 10, Iss 1807, p 1807 (2021)
ISSN:	2079-9292
DOI:	10.3390/electronics10151807
Popis:	Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8cf59f967489a1ae3cb976f04138569f Zobrazit plný text záznamu