SEANet: A Multi-modal Speech Enhancement Network

Autor:	Karolis Misiunas, Yunpeng Li, Marco Tagliasacchi, Dominik Roblek
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences Sound (cs.SD) Computer Science - Machine Learning Audio signal Computer science Speech recognition Accelerometer Signal Computer Science - Sound Loudness Machine Learning (cs.LG) Speech enhancement Noise Modal Feature (computer vision) Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	INTERSPEECH
Popis:	We explore the possibility of leveraging accelerometer data to perform speech enhancement in very noisy conditions. Although it is possible to only partially reconstruct user's speech from the accelerometer, the latter provides a strong conditioning signal that is not influenced from noise sources in the environment. Based on this observation, we feed a multi-modal input to SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which adopts a combination of feature losses and adversarial losses to reconstruct an enhanced version of user's speech. We trained our model with data collected by sensors mounted on an earbud and synthetically corrupted by adding different kinds of noise sources to the audio signal. Our experimental results demonstrate that it is possible to achieve very high quality results, even in the case of interfering speech at the same level of loudness. A sample of the output produced by our model is available at https://google-research.github.io/seanet/multimodal/speech. Accepted to INTERSPEECH 2020
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9e1ec4dc6f6ba2306d9ab9ae17dd9abf http://arxiv.org/abs/2009.02095 Zobrazit plný text záznamu