Zobrazeno 1 - 10
of 376
pro vyhledávání: '"Chen, Zhengyang"'
Autor:
Chen, Zhengyang, Wang, Shuai, Zhang, Mingyang, Liu, Xuechen, Yamagishi, Junichi, Qian, Yanmin
Voice conversion (VC) aims to modify the speaker's timbre while retaining speech content. Previous approaches have tokenized the outputs from self-supervised into semantic tokens, facilitating disentanglement of speech content information. Recently,
Externí odkaz:
http://arxiv.org/abs/2409.05004
Speaker diarization is typically considered a discriminative task, using discriminative approaches to produce fixed diarization results. In this paper, we explore the use of neural network-based generative methods for speaker diarization for the firs
Externí odkaz:
http://arxiv.org/abs/2409.04859
Speaker individuality information is among the most critical elements within speech signals. By thoroughly and accurately modeling this information, it can be utilized in various intelligent speech applications, such as speaker recognition, speaker d
Externí odkaz:
http://arxiv.org/abs/2407.15188
This paper proposes a speech synthesis system that allows users to specify and control the acoustic characteristics of a speaker by means of prompts describing the speaker's traits of synthesized speech. Unlike previous approaches, our method utilize
Externí odkaz:
http://arxiv.org/abs/2406.08812
Traditional speaker diarization seeks to detect ``who spoke when'' according to speaker characteristics. Extending to target speech diarization, we detect ``when target event occurs'' according to the semantic characteristics of speech. We propose a
Externí odkaz:
http://arxiv.org/abs/2406.07198
We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal. We devise a neural architecture called Prompt-driven Target Speech Diarization (PTSD), that works with diverse
Externí odkaz:
http://arxiv.org/abs/2310.14823
Autor:
Wang, Shuai, Bai, Qibing, Liu, Qi, Yu, Jianwei, Chen, Zhengyang, Han, Bing, Qian, Yanmin, Li, Haizhou
Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets. To boost the system performance, researchers leverage large pretrained models such as WavLM to transfer learned high-level feat
Externí odkaz:
http://arxiv.org/abs/2309.11730
Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers, while targe
Externí odkaz:
http://arxiv.org/abs/2309.06672
The mismatch between close-set training and open-set testing usually leads to significant performance degradation for speaker verification task. For existing loss functions, metric learning-based objectives depend strongly on searching effective pair
Externí odkaz:
http://arxiv.org/abs/2307.08205
Autor:
Wang, Shuai, Liang, Chengdong, Xiang, Xu, Han, Bing, Chen, Zhengyang, Wang, Hongji, Ding, Wen
This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023 Challenge. Our aim is to provide participants, especially those with limited experience, with clear and straightforward guidelines to develop their initial syst
Externí odkaz:
http://arxiv.org/abs/2306.15161