Popis: |
Speaker embeddings are ubiquitous, with applications ranging from speaker recognition and diarization to speech synthesis and voice anonymization. The amount of information held by these embeddings lends them versatility but also raises privacy concerns. Speaker embeddings have been shown to contain sensitive information, including the speaker’s age, sex, health state and more – in other words, information that speakers may want to keep private, especially when it is not required for the target task. In this work, we propose a method for removing and manipulating private attribute information in speaker representations that leverages a Vector-Quantized Variational Autoencoder architecture combined with an adversarial classifier and a novel mutual information loss. We validate our model on two attributes, sex and age, and perform experiments to remove or manipulate this information using ignorant and informed attackers. The model is tested with in-domain and out-of-domain data to assess its robustness, and the resulting speaker representations are used in a speaker verification scenario to validate their utility. Our results show that our model obtains a strong trade-off between utility and privacy, achieving age and sex classification results near chance level for both attackers and yielding little impact on speaker verification performance. |