Multi-channel target speech extraction with channel decorrelation and target speaker adaptation

Autor:	Yijie Li, Yanhua Long, Xinyuan Zhou, Jiangyu Han
Rok vydání:	2020
Předmět:	Artificial neural network Computer science Audio and Speech Processing (eess.AS) Speech recognition Feature extraction FOS: Electrical engineering electronic engineering information engineering Representation (mathematics) Adaptation (computer science) Encoder Spatial analysis Decorrelation Communication channel Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	ICASSP
DOI:	10.48550/arxiv.2010.09191
Popis:	The end-to-end approaches for single-channel target speech extraction have attracted widespread attention. However, the studies for end-to-end multi-channel target speech extraction are still relatively limited. In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech. The first one is using a target speech adaptation layer in a parallel encoder architecture. The second one is designing a channel decorrelation mechanism to extract the inter-channel differential information to enhance the multi-channel encoder representation. We compare the proposed methods with two strong state-of-the-art baselines. Experimental results on the multi-channel reverberant WSJ0 2-mix dataset demonstrate that our proposed methods achieve up to 11.2% and 11.5% relative improvements in SDR and SiSDR respectively, which are the best reported results on this task to the best of our knowledge. Comment: 5 pages, 3 figures. Submitted to ICASSP 2021
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::49cbdd6d5cda7f9eef915853a6deafc3 Zobrazit plný text záznamu