Výsledky vyhledávání - "Qian, Yanmin"

Report

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

Autor: Li, Chenda, Cornell, Samuele, Watanabe, Shinji, Qian, Yanmin

Diffusion-based generative models (DGMs) have recently attracted attention in speech enhancement research (SE) as previous works showed a remarkable generalization capability. However, DGMs are also computationally intensive, as they usually require

Externí odkaz: http://arxiv.org/abs/2406.13471

Zobrazit plný text záznamu

Report

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Autor: Jiang, Anbai, Han, Bing, Lv, Zhiqiang, Deng, Yufeng, Zhang, Wei-Qiang, Chen, Xie, Qian, Yanmin, Liu, Jia, Fan, Pingyi

Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machi

Externí odkaz: http://arxiv.org/abs/2406.11364

Zobrazit plný text záznamu

Report

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems

Autor: Chen, Zhengyang, Liu, Xuechen, Cooper, Erica, Yamagishi, Junichi, Qian, Yanmin

This paper proposes a speech synthesis system that allows users to specify and control the acoustic characteristics of a speaker by means of prompts describing the speaker's traits of synthesized speech. Unlike previous approaches, our method utilize

Externí odkaz: http://arxiv.org/abs/2406.08812

Zobrazit plný text záznamu

Report

Target Speech Diarization with Multimodal Prompts

Autor: Jiang, Yidi, Tao, Ruijie, Chen, Zhengyang, Qian, Yanmin, Li, Haizhou

Traditional speaker diarization seeks to detect ``who spoke when'' according to speaker characteristics. Extending to target speech diarization, we detect ``when target event occurs'' according to the semantic characteristics of speech. We propose a

Externí odkaz: http://arxiv.org/abs/2406.07198

Zobrazit plný text záznamu

Report

Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization

Autor: Liu, Bei, Wang, Haoyu, Qian, Yanmin

Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verifica

Externí odkaz: http://arxiv.org/abs/2406.05359

Zobrazit plný text záznamu

Report

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

Autor: Zhang, Wangyou, Scheibler, Robin, Saijo, Kohei, Cornell, Samuele, Li, Chenda, Ni, Zhaoheng, Kumar, Anurag, Pirklbauer, Jan, Sach, Marvin, Watanabe, Shinji, Fingscheidt, Tim, Qian, Yanmin

The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this

Externí odkaz: http://arxiv.org/abs/2406.04660

Zobrazit plný text záznamu

Report

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

Autor: Zhang, Wangyou, Saijo, Kohei, Jung, Jee-weon, Li, Chenda, Watanabe, Shinji, Qian, Yanmin

Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrev

Externí odkaz: http://arxiv.org/abs/2406.04269

Zobrazit plný text záznamu

Report

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Autor: Le, Chenyang, Qian, Yao, Wang, Dongmei, Zhou, Long, Liu, Shujie, Wang, Xiaofei, Yousefi, Midia, Qian, Yanmin, Li, Jinyu, Zhao, Sheng, Zeng, Michael

There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeli

Externí odkaz: http://arxiv.org/abs/2405.17809

Zobrazit plný text záznamu

Report

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

Autor: Wang, Haoyu, Liu, Bei, Shao, Hang, Xiao, Bo, Zeng, Ke, Wan, Guanglu, Qian, Yanmin

Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from

Externí odkaz: http://arxiv.org/abs/2405.17233

Zobrazit plný text záznamu

Report

GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting

Autor: Chen, Bo, Hu, Shoukang, Chen, Qi, Du, Chenpeng, Yi, Ran, Qian, Yanmin, Chen, Xie

We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\sim$5 minute video for training material, in comparison with previous 2D and

Externí odkaz: http://arxiv.org/abs/2404.19040

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání