Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Yao, Jixun"'
Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the rec
Externí odkaz:
http://arxiv.org/abs/2406.07846
Advancements in synthesized speech have created a growing threat of impersonation, making it crucial to develop deepfake algorithm recognition. One significant aspect is out-of-distribution (OOD) detection, which has gained notable attention due to i
Externí odkaz:
http://arxiv.org/abs/2406.02233
Speaker anonymization is an effective privacy protection solution that aims to conceal the speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-
Externí odkaz:
http://arxiv.org/abs/2405.10786
Speaker anonymization aims to conceal a speaker's identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and achieve anonymization by averaging
Externí odkaz:
http://arxiv.org/abs/2310.05051
Autor:
Ning, Ziqian, Jiang, Yuepeng, Zhu, Pengcheng, Wang, Shuai, Yao, Jixun, Xie, Lei, Bi, Mengxiao
Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architectur
Externí odkaz:
http://arxiv.org/abs/2309.15496
Autor:
Yao, Jixun, Yang, Yuguang, Lei, Yi, Ning, Ziqian, Hu, Yanni, Pan, Yu, Yin, Jingjing, Zhou, Hongbin, Lu, Heng, Xie, Lei
Style voice conversion aims to transform the style of source speech to a desired style according to real-world application demands. However, the current style voice conversion approach relies on pre-defined labels or reference speech to control the c
Externí odkaz:
http://arxiv.org/abs/2309.09262
As a type of biometric identification, a speaker identification (SID) system is confronted with various kinds of attacks. The spoofing attacks typically imitate the timbre of the target speakers, while the adversarial attacks confuse the SID system b
Externí odkaz:
http://arxiv.org/abs/2309.00929
Autor:
Pan, Yu, Yang, Yuguang, Huang, Yuheng, Yao, Jixun, Yin, Jingjing, Hu, Yanni, Lu, Heng, Ma, Lei, Zhao, Jianjun
Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world. While current studies primarily focus on recognition and generalization abilities,
Externí odkaz:
http://arxiv.org/abs/2308.04025
Contrastive cross-modality pretraining has recently exhibited impressive success in diverse fields, whereas there is limited research on their merits in speech emotion recognition (SER). In this paper, we propose GEmo-CLAP, a kind of gender-attribute
Externí odkaz:
http://arxiv.org/abs/2306.07848
In this study, we propose a timbre-reserved adversarial attack approach for speaker identification (SID) to not only exploit the weakness of the SID model but also preserve the timbre of the target speaker in a black-box attack setting. Particularly,
Externí odkaz:
http://arxiv.org/abs/2305.19020