Reversible speaker de-identification using pre-trained transformation functions
Autor: | Eduardo Rodriguez-Banga, Carmen García-Mateo, Daniel Erro, Laura Docio-Fernandez, Carmen Magariños, Paula Lopez-Otero |
---|---|
Rok vydání: | 2017 |
Předmět: |
Computer science
Speech recognition Amplitude scaling De-identification 020206 networking & telecommunications 02 engineering and technology Speaker recognition Theoretical Computer Science Universality (dynamical systems) Human-Computer Interaction Speaker diarisation 030507 speech-language pathology & audiology 03 medical and health sciences Naturalness 0202 electrical engineering electronic engineering information engineering Image warping 0305 other medical science Software Drawback |
Zdroj: | Computer Speech & Language. 46:36-52 |
ISSN: | 0885-2308 |
DOI: | 10.1016/j.csl.2017.05.001 |
Popis: | Speaker de-identification approaches must accomplish three main goals: universality, naturalness and reversibility. The main drawback of the traditional approach to speaker de-identification using voice conversion techniques is its lack of universality, since a parallel corpus between the input and target speakers is necessary to train the conversion parameters. It is possible to make use of a synthetic target to overcome this issue, but this harms the naturalness of the resulting de-identified speech. Hence, a technique is proposed in this paper in which a pool of pre-trained transformations between a set of speakers is used as follows: given a new user to de-identify, its most similar speaker in this set of speakers is chosen as the source speaker, and the speaker that is the most dissimilar to the source speaker is chosen as the target speaker. Speaker similarity is measured using the i-vector paradigm, which is usually employed as an objective measure of speaker de-identification performance, leading to a system with high de-identification accuracy. The transformation method is based on frequency warping and amplitude scaling, in order to obtain natural sounding speech while masking the identity of the speaker. In addition, compared to other voice conversion approaches, the proposed method is easily reversible. Experiments were conducted on Albayzin database, and performance was evaluated in terms of objective and subjective measures. These results showed a high success when de-identifying speech, as well as a great naturalness of the transformed voices. In addition, when making the transformation parameters available to a trusted holder, it is possible to invert the de-identification procedure, hence recovering the original speaker identity. The computational cost of the proposed approach is small, making it possible to produce de-identified speech in real-time with a high level of naturalness. |
Databáze: | OpenAIRE |
Externí odkaz: |