Disentangling speech from surroundings with neural embeddings

Autor:	Omran, Ahmed, Zeghidour, Neil, Borsos, Zalán, Quitry, Félix de Chaumont, Slaney, Malcolm, Tagliasacchi, Marco
Rok vydání:	2022
Předmět:	Computer Science - Sound Computer Science - Machine Learning Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec. We introduce a new training procedure that allows our model to produce structured encodings of audio waveforms given by embedding vectors, where one part of the embedding vector represents the speech signal, and the rest represent the environment. We achieve this by partitioning the embeddings of different input waveforms and training the model to faithfully reconstruct audio from mixed partitions, thereby ensuring each partition encodes a separate audio attribute. As use cases, we demonstrate the separation of speech from background noise or from reverberation characteristics. Our method also allows for targeted adjustments of the audio output characteristics. Comment: Accepted at ICASSP 2023
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2203.15578 Zobrazit plný text záznamu View this record from Arxiv