COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

Autor:	Xavier Favory, Konstantinos Drosos, Tuomas Oskari Virtanen, Xavier Serra
Přispěvatelé:	Tampere University, Computing Sciences
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Audio and Speech Processing (eess.AS) Statistics - Machine Learning FOS: Electrical engineering electronic engineering information engineering Machine Learning (stat.ML) 113 Computer and information sciences Information Retrieval (cs.IR) Machine Learning (cs.LG) Computer Science - Information Retrieval Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	Tampere University
Popis:	Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features. For achieving high performance, DNNs often need a large amount of annotated data which can be difficult and costly to obtain. In this paper, we propose a method for learning audio representations, aligning the learned latent representations of audio and associated tags. Aligning is done by maximizing the agreement of the latent representations of audio and tags, using a contrastive loss. The result is an audio embedding model which reflects acoustic and semantic characteristics of sounds. We evaluate the quality of our embedding model, measuring its performance as a feature extractor on three different tasks (namely, sound event recognition, and music genre and musical instrument classification), and investigate what type of characteristics the model captures. Our results are promising, sometimes in par with the state-of-the-art in the considered tasks and the embeddings produced with our method are well correlated with some acoustic descriptors. 8 pages, 1 figure, workshop on Self-supervision in Audio and Speech at the 37th International Conference on Machine Learning (ICML), 2020, Vienna, Austria
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::781283319cd4d6e57fae6c51e831da0b http://arxiv.org/abs/2006.08386 Zobrazit plný text záznamu