Výsledky vyhledávání

Report

Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis

Autor: Lu, Ye-Xin, Du, Hui-Peng, Sheng, Zheng-Yan, Ai, Yang, Ling, Zhen-Hua

This paper proposes an Incremental Disentanglement-based Environment-Aware zero-shot text-to-speech (TTS) method, dubbed IDEA-TTS, that can synthesize speech for unseen speakers while preserving the acoustic characteristics of a given environment ref

Externí odkaz: http://arxiv.org/abs/2412.16977

Zobrazit plný text záznamu

Report

DiM-Gestor: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2

Autor: Zhang, Fan, Zhao, Siyuan, Ji, Naye, Wang, Zhaohan, Wu, Jingmei, Gao, Fuxing, Ye, Zhenqing, Yan, Leyao, Dai, Lanxin, Geng, Weidong, Lyu, Xin, Zhao, Bozuo, Yu, Dingguo, Du, Hui, Hu, Bin

Speech-driven gesture generation using transformer-based generative models represents a rapidly advancing area within virtual human creation. However, existing models face significant challenges due to their quadratic time and space complexities, lim

Externí odkaz: http://arxiv.org/abs/2411.16729

Zobrazit plný text záznamu

Report

A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram based on Amplitude and Phase Predictions

Autor: Du, Hui-Peng, Lu, Ye-Xin, Ai, Yang, Ling, Zhen-Hua

This paper proposes a novel neural denoising vocoder that can generate clean speech waveforms from noisy mel-spectrograms. The proposed neural denoising vocoder consists of two components, i.e., a spectrum predictor and a enhancement module. The spec

Externí odkaz: http://arxiv.org/abs/2411.12268

Zobrazit plný text záznamu

Report

ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram

Autor: Jiang, Xiao-Hang, Du, Hui-Peng, Ai, Yang, Lu, Ye-Xin, Ling, Zhen-Hua

This paper proposes ESTVocoder, a novel excitation-spectral-transformed neural vocoder within the framework of source-filter theory. The ESTVocoder transforms the amplitude and phase spectra of the excitation into the corresponding speech amplitude a

Externí odkaz: http://arxiv.org/abs/2411.11258

Zobrazit plný text záznamu

Report

SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features

Autor: Shi, Yu-Fei, Ai, Yang, Lu, Ye-Xin, Du, Hui-Peng, Ling, Zhen-Hua

Assessing the naturalness of speech using mean opinion score (MOS) prediction models has positive implications for the automatic evaluation of speech synthesis systems. Early MOS prediction models took the raw waveform or amplitude spectrum of speech

Externí odkaz: http://arxiv.org/abs/2411.11232

Zobrazit plný text záznamu

Report

Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion

Autor: Shi, Yu-Fei, Ai, Yang, Lu, Ye-Xin, Du, Hui-Peng, Ling, Zhen-Hua

We participated in track 2 of the VoiceMOS Challenge 2024, which aimed to predict the mean opinion score (MOS) of singing samples. Our submission secured the first place among all participating teams, excluding the official baseline. In this paper, w

Externí odkaz: http://arxiv.org/abs/2411.11123

Zobrazit plný text záznamu

Report

MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios

Autor: Jiang, Xiao-Hang, Ai, Yang, Zheng, Rui-Chen, Du, Hui-Peng, Lu, Ye-Xin, Ling, Zhen-Hua

In this paper, we propose MDCTCodec, an efficient lightweight end-to-end neural audio codec based on the modified discrete cosine transform (MDCT). The encoder takes the MDCT spectrum of audio as input, encoding it into a continuous latent code which

Externí odkaz: http://arxiv.org/abs/2411.00464

Zobrazit plný text záznamu

Report

APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm

Autor: Du, Hui-Peng, Ai, Yang, Zheng, Rui-Chen, Ling, Zhen-Hua

This paper proposes a novel neural audio codec, named APCodec+, which is an improved version of APCodec. The APCodec+ takes the audio amplitude and phase spectra as the coding object, and employs an adversarial training strategy. Innovatively, we pro

Externí odkaz: http://arxiv.org/abs/2410.22807

Zobrazit plný text záznamu

Report

ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs

Autor: Zheng, Rui-Chen, Du, Hui-Peng, Jiang, Xiao-Hang, Ai, Yang, Ling, Zhen-Hua

Current neural audio codecs typically use residual vector quantization (RVQ) to discretize speech signals. However, they often experience codebook collapse, which reduces the effective codebook size and leads to suboptimal performance. To address thi

Externí odkaz: http://arxiv.org/abs/2410.12359

Zobrazit plný text záznamu

Report

Stage-Wise and Prior-Aware Neural Speech Phase Prediction

Autor: Liu, Fei, Ai, Yang, Du, Hui-Peng, Lu, Ye-Xin, Zheng, Rui-Chen, Ling, Zhen-Hua

This paper proposes a novel Stage-wise and Prior-aware Neural Speech Phase Prediction (SP-NSPP) model, which predicts the phase spectrum from input amplitude spectrum by two-stage neural networks. In the initial prior-construction stage, we prelimina

Externí odkaz: http://arxiv.org/abs/2410.04990

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání