Zobrazeno 1 - 10
of 244
pro vyhledávání: '"Qian, Yanmin"'
Autor:
Wang, Shuai, Zhang, Ke, Lin, Shaoxiong, Li, Junjie, Wang, Xuefei, Ge, Meng, Yu, Jianwei, Qian, Yanmin, Li, Haizhou
Target speaker extraction (TSE) focuses on isolating the speech of a specific target speaker from overlapped multi-talker speech, which is a typical setup in the cocktail party problem. In recent years, TSE draws increasing attention due to its poten
Externí odkaz:
http://arxiv.org/abs/2409.15799
Anomalous Sound Detection (ASD) has gained significant interest through the application of various Artificial Intelligence (AI) technologies in industrial settings. Though possessing great potential, ASD systems can hardly be readily deployed in real
Externí odkaz:
http://arxiv.org/abs/2409.07016
Autor:
Chen, Zhengyang, Wang, Shuai, Zhang, Mingyang, Liu, Xuechen, Yamagishi, Junichi, Qian, Yanmin
Voice conversion (VC) aims to modify the speaker's timbre while retaining speech content. Previous approaches have tokenized the outputs from self-supervised into semantic tokens, facilitating disentanglement of speech content information. Recently,
Externí odkaz:
http://arxiv.org/abs/2409.05004
Speaker diarization is typically considered a discriminative task, using discriminative approaches to produce fixed diarization results. In this paper, we explore the use of neural network-based generative methods for speaker diarization for the firs
Externí odkaz:
http://arxiv.org/abs/2409.04859
Speaker individuality information is among the most critical elements within speech signals. By thoroughly and accurately modeling this information, it can be utilized in various intelligent speech applications, such as speaker recognition, speaker d
Externí odkaz:
http://arxiv.org/abs/2407.15188
Diffusion-based generative models (DGMs) have recently attracted attention in speech enhancement research (SE) as previous works showed a remarkable generalization capability. However, DGMs are also computationally intensive, as they usually require
Externí odkaz:
http://arxiv.org/abs/2406.13471
Autor:
Jiang, Anbai, Han, Bing, Lv, Zhiqiang, Deng, Yufeng, Zhang, Wei-Qiang, Chen, Xie, Qian, Yanmin, Liu, Jia, Fan, Pingyi
Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machi
Externí odkaz:
http://arxiv.org/abs/2406.11364
This paper proposes a speech synthesis system that allows users to specify and control the acoustic characteristics of a speaker by means of prompts describing the speaker's traits of synthesized speech. Unlike previous approaches, our method utilize
Externí odkaz:
http://arxiv.org/abs/2406.08812
Traditional speaker diarization seeks to detect ``who spoke when'' according to speaker characteristics. Extending to target speech diarization, we detect ``when target event occurs'' according to the semantic characteristics of speech. We propose a
Externí odkaz:
http://arxiv.org/abs/2406.07198
Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verifica
Externí odkaz:
http://arxiv.org/abs/2406.05359