An Attribute-Aligned Strategy for Learning Speech Representation

Autor:	Huang, Yu-Lin, Su, Bo-Hao, Hong, Y. -W. Peter, Lee, Chi-Chun
Rok vydání:	2021
Předmět:	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Machine Learning
Zdroj:	Proceedings of INTERSPEECH 2021
Druh dokumentu:	Working Paper
DOI:	10.21437/Interspeech.2021-1341
Popis:	Advancement in speech technology has brought convenience to our life. However, the concern is on the rise as speech signal contains multiple personal attributes, which would lead to either sensitive information leakage or bias toward decision. In this work, we propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism. Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes, to derive an identity-free representation for speech emotion recognition (SER), and an emotionless representation for speaker verification (SV). Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV, comparing to the current state-of-the-art method of using adversarial learning applied on a large emotion corpora, the MSP-Podcast. Also, our proposed learning strategy reduces the model and training process needed to achieve multiple privacy-preserving tasks. Comment: 5 pages, 2 figures; Accepted in Interspeech 2021
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2106.02810 Zobrazit plný text záznamu View this record from Arxiv