Zobrazeno 1 - 10
of 185
pro vyhledávání: '"Qian, Yanmin"'
Diffusion-based generative models (DGMs) have recently attracted attention in speech enhancement research (SE) as previous works showed a remarkable generalization capability. However, DGMs are also computationally intensive, as they usually require
Externí odkaz:
http://arxiv.org/abs/2406.13471
Autor:
Jiang, Anbai, Han, Bing, Lv, Zhiqiang, Deng, Yufeng, Zhang, Wei-Qiang, Chen, Xie, Qian, Yanmin, Liu, Jia, Fan, Pingyi
Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machi
Externí odkaz:
http://arxiv.org/abs/2406.11364
This paper proposes a speech synthesis system that allows users to specify and control the acoustic characteristics of a speaker by means of prompts describing the speaker's traits of synthesized speech. Unlike previous approaches, our method utilize
Externí odkaz:
http://arxiv.org/abs/2406.08812
Traditional speaker diarization seeks to detect ``who spoke when'' according to speaker characteristics. Extending to target speech diarization, we detect ``when target event occurs'' according to the semantic characteristics of speech. We propose a
Externí odkaz:
http://arxiv.org/abs/2406.07198
Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verifica
Externí odkaz:
http://arxiv.org/abs/2406.05359
Autor:
Zhang, Wangyou, Scheibler, Robin, Saijo, Kohei, Cornell, Samuele, Li, Chenda, Ni, Zhaoheng, Kumar, Anurag, Pirklbauer, Jan, Sach, Marvin, Watanabe, Shinji, Fingscheidt, Tim, Qian, Yanmin
The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this
Externí odkaz:
http://arxiv.org/abs/2406.04660
Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrev
Externí odkaz:
http://arxiv.org/abs/2406.04269
Autor:
Le, Chenyang, Qian, Yao, Wang, Dongmei, Zhou, Long, Liu, Shujie, Wang, Xiaofei, Yousefi, Midia, Qian, Yanmin, Li, Jinyu, Zhao, Sheng, Zeng, Michael
There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeli
Externí odkaz:
http://arxiv.org/abs/2405.17809
Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from
Externí odkaz:
http://arxiv.org/abs/2405.17233
We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\sim$5 minute video for training material, in comparison with previous 2D and
Externí odkaz:
http://arxiv.org/abs/2404.19040