Výsledky vyhledávání

Report

DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion

Autor: Ning, Ziqian, Wang, Shuai, Zhu, Pengcheng, Wang, Zhichao, Yao, Jixun, Xie, Lei, Bi, Mengxiao

Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the rec

Externí odkaz: http://arxiv.org/abs/2406.07846

Zobrazit plný text záznamu

Report

Towards Out-of-Distribution Detection in Vocoder Recognition via Latent Feature Reconstruction

Autor: Du, Renmingyue, Yao, Jixun, Kong, Qiuqiang, Cao, Yin

Advancements in synthesized speech have created a growing threat of impersonation, making it crucial to develop deepfake algorithm recognition. One significant aspect is out-of-distribution (OOD) detection, which has gained notable attention due to i

Externí odkaz: http://arxiv.org/abs/2406.02233

Zobrazit plný text záznamu

Report

Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix

Autor: Yao, Jixun, Wang, Qing, Guo, Pengcheng, Ning, Ziqian, Xie, Lei

Speaker anonymization is an effective privacy protection solution that aims to conceal the speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-

Externí odkaz: http://arxiv.org/abs/2405.10786

Zobrazit plný text záznamu

Report

SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

Autor: Lv, Yuanjun, Yao, Jixun, Chen, Peikun, Zhou, Hongbin, Lu, Heng, Xie, Lei

Speaker anonymization aims to conceal a speaker's identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and achieve anonymization by averaging

Externí odkaz: http://arxiv.org/abs/2310.05051

Zobrazit plný text záznamu

Report

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

Autor: Ning, Ziqian, Jiang, Yuepeng, Zhu, Pengcheng, Wang, Shuai, Yao, Jixun, Xie, Lei, Bi, Mengxiao

Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architectur

Externí odkaz: http://arxiv.org/abs/2309.15496

Zobrazit plný text záznamu

Report

PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

Autor: Yao, Jixun, Yang, Yuguang, Lei, Yi, Ning, Ziqian, Hu, Yanni, Pan, Yu, Yin, Jingjing, Zhou, Hongbin, Lu, Heng, Xie, Lei

Style voice conversion aims to transform the style of source speech to a desired style according to real-world application demands. However, the current style voice conversion approach relies on pre-defined labels or reference speech to control the c

Externí odkaz: http://arxiv.org/abs/2309.09262

Zobrazit plný text záznamu

Report

Timbre-reserved Adversarial Attack in Speaker Identification

Autor: Wang, Qing, Yao, Jixun, Zhang, Li, Guo, Pengcheng, Xie, Lei

As a type of biometric identification, a speaker identification (SID) system is confronted with various kinds of attacks. The spoofing attacks typically imitate the timbre of the target speakers, while the adversarial attacks confuse the SID system b

Externí odkaz: http://arxiv.org/abs/2309.00929

Zobrazit plný text záznamu

Report

MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition

Autor: Pan, Yu, Yang, Yuguang, Huang, Yuheng, Yao, Jixun, Yin, Jingjing, Hu, Yanni, Lu, Heng, Ma, Lei, Zhao, Jianjun

Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world. While current studies primarily focus on recognition and generalization abilities,

Externí odkaz: http://arxiv.org/abs/2308.04025

Zobrazit plný text záznamu

Report

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition

Autor: Pan, Yu, Hu, Yanni, Yang, Yuguang, Fei, Wen, Yao, Jixun, Lu, Heng, Ma, Lei, Zhao, Jianjun

Contrastive cross-modality pretraining has recently exhibited impressive success in diverse fields, whereas there is limited research on their merits in speech emotion recognition (SER). In this paper, we propose GEmo-CLAP, a kind of gender-attribute

Externí odkaz: http://arxiv.org/abs/2306.07848

Zobrazit plný text záznamu

Report

Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification

Autor: Wang, Qing, Yao, Jixun, Wang, Ziqian, Guo, Pengcheng, Xie, Lei

In this study, we propose a timbre-reserved adversarial attack approach for speaker identification (SID) to not only exploit the weakness of the SID model but also preserve the timbre of the target speaker in a black-box attack setting. Particularly,

Externí odkaz: http://arxiv.org/abs/2305.19020

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání