Disentangled Speaker Representation Learning via Mutual Information Minimization

Autor:	Mun, Sung Hwan, Han, Min Hyun, Kim, Minchan, Lee, Dongjune, Kim, Nam Soo
Rok vydání:	2022
Předmět:	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound
Druh dokumentu:	Working Paper
Popis:	Domain mismatch problem caused by speaker-unrelated feature has been a major topic in speaker recognition. In this paper, we propose an explicit disentanglement framework to unravel speaker-relevant features from speaker-unrelated features via mutual information (MI) minimization. To achieve our goal of minimizing MI between speaker-related and speaker-unrelated features, we adopt a contrastive log-ratio upper bound (CLUB), which exploits the upper bound of MI. Our framework is constructed in a 3-stage structure. First, in the front-end encoder, input speech is encoded into shared initial embedding. Next, in the decoupling block, shared initial embedding is split into separate speaker-related and speaker-unrelated embeddings. Finally, disentanglement is conducted by MI minimization in the last stage. Experiments on Far-Field Speaker Verification Challenge 2022 (FFSVC2022) demonstrate that our proposed framework is effective for disentanglement. Also, to utilize domain-unknown datasets containing numerous speakers, we pre-trained the front-end encoder with VoxCeleb datasets. We then fine-tuned the speaker embedding model in the disentanglement framework with FFSVC 2022 dataset. The experimental results show that fine-tuning with a disentanglement framework on a existing pre-trained model is valid and can further improve performance. Comment: Accepted by APSIPA ASC 2022. Camera-ready. 8 pages, 4 figures, and 1 table
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2208.08012 Zobrazit plný text záznamu View this record from Arxiv