Zobrazeno 1 - 10
of 2 819
pro vyhledávání: '"DU, HUI"'
This paper proposes an Incremental Disentanglement-based Environment-Aware zero-shot text-to-speech (TTS) method, dubbed IDEA-TTS, that can synthesize speech for unseen speakers while preserving the acoustic characteristics of a given environment ref
Externí odkaz:
http://arxiv.org/abs/2412.16977
Autor:
Zhang, Fan, Zhao, Siyuan, Ji, Naye, Wang, Zhaohan, Wu, Jingmei, Gao, Fuxing, Ye, Zhenqing, Yan, Leyao, Dai, Lanxin, Geng, Weidong, Lyu, Xin, Zhao, Bozuo, Yu, Dingguo, Du, Hui, Hu, Bin
Speech-driven gesture generation using transformer-based generative models represents a rapidly advancing area within virtual human creation. However, existing models face significant challenges due to their quadratic time and space complexities, lim
Externí odkaz:
http://arxiv.org/abs/2411.16729
This paper proposes a novel neural denoising vocoder that can generate clean speech waveforms from noisy mel-spectrograms. The proposed neural denoising vocoder consists of two components, i.e., a spectrum predictor and a enhancement module. The spec
Externí odkaz:
http://arxiv.org/abs/2411.12268
This paper proposes ESTVocoder, a novel excitation-spectral-transformed neural vocoder within the framework of source-filter theory. The ESTVocoder transforms the amplitude and phase spectra of the excitation into the corresponding speech amplitude a
Externí odkaz:
http://arxiv.org/abs/2411.11258
Assessing the naturalness of speech using mean opinion score (MOS) prediction models has positive implications for the automatic evaluation of speech synthesis systems. Early MOS prediction models took the raw waveform or amplitude spectrum of speech
Externí odkaz:
http://arxiv.org/abs/2411.11232
We participated in track 2 of the VoiceMOS Challenge 2024, which aimed to predict the mean opinion score (MOS) of singing samples. Our submission secured the first place among all participating teams, excluding the official baseline. In this paper, w
Externí odkaz:
http://arxiv.org/abs/2411.11123
In this paper, we propose MDCTCodec, an efficient lightweight end-to-end neural audio codec based on the modified discrete cosine transform (MDCT). The encoder takes the MDCT spectrum of audio as input, encoding it into a continuous latent code which
Externí odkaz:
http://arxiv.org/abs/2411.00464
This paper proposes a novel neural audio codec, named APCodec+, which is an improved version of APCodec. The APCodec+ takes the audio amplitude and phase spectra as the coding object, and employs an adversarial training strategy. Innovatively, we pro
Externí odkaz:
http://arxiv.org/abs/2410.22807
Current neural audio codecs typically use residual vector quantization (RVQ) to discretize speech signals. However, they often experience codebook collapse, which reduces the effective codebook size and leads to suboptimal performance. To address thi
Externí odkaz:
http://arxiv.org/abs/2410.12359
This paper proposes a novel Stage-wise and Prior-aware Neural Speech Phase Prediction (SP-NSPP) model, which predicts the phase spectrum from input amplitude spectrum by two-stage neural networks. In the initial prior-construction stage, we prelimina
Externí odkaz:
http://arxiv.org/abs/2410.04990