Výsledky vyhledávání - "Qian, Yanmin"

Report

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction

Autor: Wang, Shuai, Zhang, Ke, Lin, Shaoxiong, Li, Junjie, Wang, Xuefei, Ge, Meng, Yu, Jianwei, Qian, Yanmin, Li, Haizhou

Target speaker extraction (TSE) focuses on isolating the speech of a specific target speaker from overlapped multi-talker speech, which is a typical setup in the cocktail party problem. In recent years, TSE draws increasing attention due to its poten

Externí odkaz: http://arxiv.org/abs/2409.15799

Zobrazit plný text záznamu

Report

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Autor: Zheng, Xinhu, Jiang, Anbai, Han, Bing, Qian, Yanmin, Fan, Pingyi, Liu, Jia, Zhang, Wei-Qiang

Anomalous Sound Detection (ASD) has gained significant interest through the application of various Artificial Intelligence (AI) technologies in industrial settings. Though possessing great potential, ASD systems can hardly be readily deployed in real

Externí odkaz: http://arxiv.org/abs/2409.07016

Zobrazit plný text záznamu

Report

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion

Autor: Chen, Zhengyang, Wang, Shuai, Zhang, Mingyang, Liu, Xuechen, Yamagishi, Junichi, Qian, Yanmin

Voice conversion (VC) aims to modify the speaker's timbre while retaining speech content. Previous approaches have tokenized the outputs from self-supervised into semantic tokens, facilitating disentanglement of speech content information. Recently,

Externí odkaz: http://arxiv.org/abs/2409.05004

Zobrazit plný text záznamu

Report

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching

Autor: Chen, Zhengyang, Han, Bing, Wang, Shuai, Jiang, Yidi, Qian, Yanmin

Speaker diarization is typically considered a discriminative task, using discriminative approaches to produce fixed diarization results. In this paper, we explore the use of neural network-based generative methods for speaker diarization for the firs

Externí odkaz: http://arxiv.org/abs/2409.04859

Zobrazit plný text záznamu

Report

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

Autor: Wang, Shuai, Chen, Zhengyang, Lee, Kong Aik, Qian, Yanmin, Li, Haizhou

Speaker individuality information is among the most critical elements within speech signals. By thoroughly and accurately modeling this information, it can be utilized in various intelligent speech applications, such as speaker recognition, speaker d

Externí odkaz: http://arxiv.org/abs/2407.15188

Zobrazit plný text záznamu

Report

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

Autor: Li, Chenda, Cornell, Samuele, Watanabe, Shinji, Qian, Yanmin

Diffusion-based generative models (DGMs) have recently attracted attention in speech enhancement research (SE) as previous works showed a remarkable generalization capability. However, DGMs are also computationally intensive, as they usually require

Externí odkaz: http://arxiv.org/abs/2406.13471

Zobrazit plný text záznamu

Report

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Autor: Jiang, Anbai, Han, Bing, Lv, Zhiqiang, Deng, Yufeng, Zhang, Wei-Qiang, Chen, Xie, Qian, Yanmin, Liu, Jia, Fan, Pingyi

Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machi

Externí odkaz: http://arxiv.org/abs/2406.11364

Zobrazit plný text záznamu

Report

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems

Autor: Chen, Zhengyang, Liu, Xuechen, Cooper, Erica, Yamagishi, Junichi, Qian, Yanmin

This paper proposes a speech synthesis system that allows users to specify and control the acoustic characteristics of a speaker by means of prompts describing the speaker's traits of synthesized speech. Unlike previous approaches, our method utilize

Externí odkaz: http://arxiv.org/abs/2406.08812

Zobrazit plný text záznamu

Report

Target Speech Diarization with Multimodal Prompts

Autor: Jiang, Yidi, Tao, Ruijie, Chen, Zhengyang, Qian, Yanmin, Li, Haizhou

Traditional speaker diarization seeks to detect ``who spoke when'' according to speaker characteristics. Extending to target speech diarization, we detect ``when target event occurs'' according to the semantic characteristics of speech. We propose a

Externí odkaz: http://arxiv.org/abs/2406.07198

Zobrazit plný text záznamu

Report

Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization

Autor: Liu, Bei, Wang, Haoyu, Qian, Yanmin

Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verifica

Externí odkaz: http://arxiv.org/abs/2406.05359

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání