Speaker Anonymization for Personal Information Protection Using Voice Conversion Techniques
Autor: | Dongsuk Yook, In-Chul Yoo, Hyunwoo Oh, Keonnyeong Lee, Seong-Gyun Leem, BongGu Ko |
---|---|
Rok vydání: | 2020 |
Předmět: |
speaker anonymization
voice conversion General Computer Science Artificial neural network Computer science Speech recognition Fingerprint (computing) General Engineering 02 engineering and technology 010501 environmental sciences Speaker recognition 01 natural sciences deep neural networks Rule-based machine translation 0202 electrical engineering electronic engineering information engineering variational autoencoder 020201 artificial intelligence & image processing General Materials Science lcsh:Electrical engineering. Electronics. Nuclear engineering User interface Data privacy lcsh:TK1-9971 Personally identifiable information 0105 earth and related environmental sciences |
Zdroj: | IEEE Access, Vol 8, Pp 198637-198645 (2020) |
ISSN: | 2169-3536 |
Popis: | As speech-based user interfaces integrated in the devices such as AI speakers become ubiquitous, a large amount of user voice data is being collected to enhance the accuracy of speech recognition systems. Since such voice data contain personal information that can endanger the privacy of users, the issue of privacy protection in the speech data has garnered increasing attention after the introduction of the General Data Protection Regulation in the EU, which implies that restrictions and safety measures for the use of speech data become essential. This study aims to filter the speaker-related voice biometrics present in speech data such as voice fingerprint without altering the linguistic content to preserve the usefulness of the data while protecting the privacy of users. To achieve this, we propose an algorithm that produces anonymized speeches by adopting many-to-many voice conversion techniques based on variational autoencoders (VAEs) and modifying the speaker identity vectors of the VAE input to anonymize the speech data. We validated the effectiveness of the proposed method by measuring the speaker-related information and the original linguistic information retained in the resultant speech, using an open source speaker recognizer and a deep neural network-based automatic speech recognizer, respectively. Using the proposed method, the speaker identification accuracy of the speech data was reduced to 0.1-9.2%, indicating successful anonymization, while the speech recognition accuracy was maintained as 78.2-81.3%. |
Databáze: | OpenAIRE |
Externí odkaz: |