Výsledky vyhledávání - "Chen, Zhengyang"

Report

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion

Autor: Chen, Zhengyang, Wang, Shuai, Zhang, Mingyang, Liu, Xuechen, Yamagishi, Junichi, Qian, Yanmin

Voice conversion (VC) aims to modify the speaker's timbre while retaining speech content. Previous approaches have tokenized the outputs from self-supervised into semantic tokens, facilitating disentanglement of speech content information. Recently,

Externí odkaz: http://arxiv.org/abs/2409.05004

Zobrazit plný text záznamu

Report

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching

Autor: Chen, Zhengyang, Han, Bing, Wang, Shuai, Jiang, Yidi, Qian, Yanmin

Speaker diarization is typically considered a discriminative task, using discriminative approaches to produce fixed diarization results. In this paper, we explore the use of neural network-based generative methods for speaker diarization for the firs

Externí odkaz: http://arxiv.org/abs/2409.04859

Zobrazit plný text záznamu

Report

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

Autor: Wang, Shuai, Chen, Zhengyang, Lee, Kong Aik, Qian, Yanmin, Li, Haizhou

Speaker individuality information is among the most critical elements within speech signals. By thoroughly and accurately modeling this information, it can be utilized in various intelligent speech applications, such as speaker recognition, speaker d

Externí odkaz: http://arxiv.org/abs/2407.15188

Zobrazit plný text záznamu

Report

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems

Autor: Chen, Zhengyang, Liu, Xuechen, Cooper, Erica, Yamagishi, Junichi, Qian, Yanmin

This paper proposes a speech synthesis system that allows users to specify and control the acoustic characteristics of a speaker by means of prompts describing the speaker's traits of synthesized speech. Unlike previous approaches, our method utilize

Externí odkaz: http://arxiv.org/abs/2406.08812

Zobrazit plný text záznamu

Report

Target Speech Diarization with Multimodal Prompts

Autor: Jiang, Yidi, Tao, Ruijie, Chen, Zhengyang, Qian, Yanmin, Li, Haizhou

Traditional speaker diarization seeks to detect ``who spoke when'' according to speaker characteristics. Extending to target speech diarization, we detect ``when target event occurs'' according to the semantic characteristics of speech. We propose a

Externí odkaz: http://arxiv.org/abs/2406.07198

Zobrazit plný text záznamu

Report

Prompt-driven Target Speech Diarization

Autor: Jiang, Yidi, Chen, Zhengyang, Tao, Ruijie, Deng, Liqun, Qian, Yanmin, Li, Haizhou

We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal. We devise a neural architecture called Prompt-driven Target Speech Diarization (PTSD), that works with diverse

Externí odkaz: http://arxiv.org/abs/2310.14823

Zobrazit plný text záznamu

Report

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

Autor: Wang, Shuai, Bai, Qibing, Liu, Qi, Yu, Jianwei, Chen, Zhengyang, Han, Bing, Qian, Yanmin, Li, Haizhou

Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets. To boost the system performance, researchers leverage large pretrained models such as WavLM to transfer learned high-level feat

Externí odkaz: http://arxiv.org/abs/2309.11730

Zobrazit plný text záznamu

Report

Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer

Autor: Chen, Zhengyang, Han, Bing, Wang, Shuai, Qian, Yanmin

Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers, while targe

Externí odkaz: http://arxiv.org/abs/2309.06672

Zobrazit plný text záznamu

Report

Exploring Binary Classification Loss For Speaker Verification

Autor: Han, Bing, Chen, Zhengyang, Qian, Yanmin

The mismatch between close-set training and open-set testing usually leads to significant performance degradation for speaker verification task. For existing loss functions, metric learning-based objectives depend strongly on searching effective pair

Externí odkaz: http://arxiv.org/abs/2307.08205

Zobrazit plný text záznamu

Report

Wespeaker baselines for VoxSRC2023

Autor: Wang, Shuai, Liang, Chengdong, Xiang, Xu, Han, Bing, Chen, Zhengyang, Wang, Hongji, Ding, Wen

This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023 Challenge. Our aim is to provide participants, especially those with limited experience, with clear and straightforward guidelines to develop their initial syst

Externí odkaz: http://arxiv.org/abs/2306.15161

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání