Zobrazeno 1 - 10
of 1 042
pro vyhledávání: '"Chen Xueyuan"'
With the rise of Speech Large Language Models (Speech LLMs), there has been growing interest in discrete speech tokens for their ability to integrate with text-based tokens seamlessly. Compared to most studies that focus on continuous speech features
Externí odkaz:
http://arxiv.org/abs/2411.08742
Graph contrastive learning has achieved great success in pre-training graph neural networks without ground-truth labels. Leading graph contrastive learning follows the classical scheme of contrastive learning, forcing model to identify the essential
Externí odkaz:
http://arxiv.org/abs/2410.20356
Novel Class Discovery (NCD) involves identifying new categories within unlabeled data by utilizing knowledge acquired from previously established categories. However, existing NCD methods often struggle to maintain a balance between the performance o
Externí odkaz:
http://arxiv.org/abs/2407.17816
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech. It still suffers from low speaker similarity and poor prosody naturalness. In this paper, we propose a multi-modal DSR model by leveraging neural codec lan
Externí odkaz:
http://arxiv.org/abs/2406.08336
In this study, we propose a simple and efficient Non-Autoregressive (NAR) text-to-speech (TTS) system based on diffusion, named SimpleSpeech. Its simpleness shows in three aspects: (1) It can be trained on the speech-only dataset, without any alignme
Externí odkaz:
http://arxiv.org/abs/2406.02328
Audio-visual target speech extraction (AV-TSE) is one of the enabling technologies in robotics and many audio-visual applications. One of the challenges of AV-TSE is how to effectively utilize audio-visual synchronization information in the process.
Externí odkaz:
http://arxiv.org/abs/2403.16078
Autor:
Chen, Xueyuan, Wang, Yuejiao, Wu, Xixin, Wang, Disong, Wu, Zhiyong, Liu, Xunying, Meng, Helen
Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech by improving the intelligibility and naturalness. This is a challenging task especially for patients with severe dysarthria and speaking in complex, noisy a
Externí odkaz:
http://arxiv.org/abs/2401.17796
The expressive quality of synthesized speech for audiobooks is limited by generalized model architecture and unbalanced style distribution in the training data. To address these issues, in this paper, we propose a self-supervised style enhancing meth
Externí odkaz:
http://arxiv.org/abs/2312.12181
In contrastive learning, the choice of ``view'' controls the information that the representation captures and influences the performance of the model. However, leading graph contrastive learning methods generally produce views via random corruption o
Externí odkaz:
http://arxiv.org/abs/2305.04501