Výsledky vyhledávání - "Wang, Dingdong"

Report

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models

Autor: Wang, Dingdong, Cui, Mingyu, Yang, Dongchao, Chen, Xueyuan, Meng, Helen

With the rise of Speech Large Language Models (Speech LLMs), there has been growing interest in discrete speech tokens for their ability to integrate with text-based tokens seamlessly. Compared to most studies that focus on continuous speech features

Externí odkaz: http://arxiv.org/abs/2411.08742

Zobrazit plný text záznamu

Report

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large Language Models to perceive and generate images, texts, and speeches end-to-en

Externí odkaz: http://arxiv.org/abs/2409.18042

Zobrazit plný text záznamu

Report

Exploring SSL Discrete Tokens for Multilingual ASR

Autor: Cui, Mingyu, Tan, Daxin, Yang, Yifan, Wang, Dingdong, Wang, Huimeng, Chen, Xiao, Chen, Xie, Liu, Xunying

With the advancement of Self-supervised Learning (SSL) in speech-related tasks, there has been growing interest in utilizing discrete tokens generated by SSL for automatic speech recognition (ASR), as they offer faster processing techniques. However,

Externí odkaz: http://arxiv.org/abs/2409.08805

Zobrazit plný text záznamu

Report

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

Autor: Chen, Xueyuan, Yang, Dongchao, Wang, Dingdong, Wu, Xixin, Wu, Zhiyong, Meng, Helen

Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech. It still suffers from low speaker similarity and poor prosody naturalness. In this paper, we propose a multi-modal DSR model by leveraging neural codec lan

Externí odkaz: http://arxiv.org/abs/2406.08336

Zobrazit plný text záznamu

Report

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

Autor: Yang, Dongchao, Wang, Dingdong, Guo, Haohan, Chen, Xueyuan, Wu, Xixin, Meng, Helen

In this study, we propose a simple and efficient Non-Autoregressive (NAR) text-to-speech (TTS) system based on diffusion, named SimpleSpeech. Its simpleness shows in three aspects: (1) It can be trained on the speech-only dataset, without any alignme

Externí odkaz: http://arxiv.org/abs/2406.02328

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání