Výsledky vyhledávání

Report

Adaptive Model Predictive Control for Differential-Algebraic Systems towards a Higher Path Accuracy for Physically Coupled Robots

Autor: Ye, Xin, Handwerker, Karl, Hohmann, Sören

The physical coupling between robots has the potential to improve the capabilities of multi-robot systems in challenging manufacturing processes. However, the path tracking accuracy of physically coupled robots is not studied adequately, especially c

Externí odkaz: http://arxiv.org/abs/2412.03387

Zobrazit plný text záznamu

Report

A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram based on Amplitude and Phase Predictions

Autor: Du, Hui-Peng, Lu, Ye-Xin, Ai, Yang, Ling, Zhen-Hua

This paper proposes a novel neural denoising vocoder that can generate clean speech waveforms from noisy mel-spectrograms. The proposed neural denoising vocoder consists of two components, i.e., a spectrum predictor and a enhancement module. The spec

Externí odkaz: http://arxiv.org/abs/2411.12268

Zobrazit plný text záznamu

Report

ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram

Autor: Jiang, Xiao-Hang, Du, Hui-Peng, Ai, Yang, Lu, Ye-Xin, Ling, Zhen-Hua

This paper proposes ESTVocoder, a novel excitation-spectral-transformed neural vocoder within the framework of source-filter theory. The ESTVocoder transforms the amplitude and phase spectra of the excitation into the corresponding speech amplitude a

Externí odkaz: http://arxiv.org/abs/2411.11258

Zobrazit plný text záznamu

Report

SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features

Autor: Shi, Yu-Fei, Ai, Yang, Lu, Ye-Xin, Du, Hui-Peng, Ling, Zhen-Hua

Assessing the naturalness of speech using mean opinion score (MOS) prediction models has positive implications for the automatic evaluation of speech synthesis systems. Early MOS prediction models took the raw waveform or amplitude spectrum of speech

Externí odkaz: http://arxiv.org/abs/2411.11232

Zobrazit plný text záznamu

Report

Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion

Autor: Shi, Yu-Fei, Ai, Yang, Lu, Ye-Xin, Du, Hui-Peng, Ling, Zhen-Hua

We participated in track 2 of the VoiceMOS Challenge 2024, which aimed to predict the mean opinion score (MOS) of singing samples. Our submission secured the first place among all participating teams, excluding the official baseline. In this paper, w

Externí odkaz: http://arxiv.org/abs/2411.11123

Zobrazit plný text záznamu

Report

MTA: Multimodal Task Alignment for BEV Perception and Captioning

Autor: Ma, Yunsheng, Yaman, Burhaneddin, Ye, Xin, Tao, Feng, Mallik, Abhirup, Wang, Ziran, Ren, Liu

Bird's eye view (BEV)-based 3D perception plays a crucial role in autonomous driving applications. The rise of large language models has spurred interest in BEV-based captioning to understand object behavior in the surrounding environment. However, e

Externí odkaz: http://arxiv.org/abs/2411.10639

Zobrazit plný text záznamu

Report

Selecting Between BERT and GPT for Text Classification in Political Science Research

Autor: Wang, Yu, Qu, Wen, Ye, Xin

Political scientists often grapple with data scarcity in text classification. Recently, fine-tuned BERT models and their variants have gained traction as effective solutions to address this issue. In this study, we investigate the potential of GPT-ba

Externí odkaz: http://arxiv.org/abs/2411.05050

Zobrazit plný text záznamu

Report

MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios

Autor: Jiang, Xiao-Hang, Ai, Yang, Zheng, Rui-Chen, Du, Hui-Peng, Lu, Ye-Xin, Ling, Zhen-Hua

In this paper, we propose MDCTCodec, an efficient lightweight end-to-end neural audio codec based on the modified discrete cosine transform (MDCT). The encoder takes the MDCT spectrum of audio as input, encoding it into a continuous latent code which

Externí odkaz: http://arxiv.org/abs/2411.00464

Zobrazit plný text záznamu

Report

Stage-Wise and Prior-Aware Neural Speech Phase Prediction

Autor: Liu, Fei, Ai, Yang, Du, Hui-Peng, Lu, Ye-Xin, Zheng, Rui-Chen, Ling, Zhen-Hua

This paper proposes a novel Stage-wise and Prior-aware Neural Speech Phase Prediction (SP-NSPP) model, which predicts the phase spectrum from input amplitude spectrum by two-stage neural networks. In the initial prior-construction stage, we prelimina

Externí odkaz: http://arxiv.org/abs/2410.04990

Zobrazit plný text záznamu

Report

Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate Control

Autor: Lu, Ye-Xin, Ai, Yang, Sheng, Zheng-Yan, Ling, Zhen-Hua

The majority of existing speech bandwidth extension (BWE) methods operate under the constraint of fixed source and target sampling rates, which limits their flexibility in practical applications. In this paper, we propose a multi-stage speech BWE mod

Externí odkaz: http://arxiv.org/abs/2406.02250

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání