Výsledky vyhledávání - "Chen, Sanyuan"

Report

Autoregressive Speech Synthesis without Vector Quantization

Autor: Meng, Lingwei, Zhou, Long, Liu, Shujie, Chen, Sanyuan, Han, Bing, Hu, Shujie, Liu, Yanqing, Li, Jinyu, Zhao, Sheng, Wu, Xixin, Meng, Helen, Wei, Furu

We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector qua

Externí odkaz: http://arxiv.org/abs/2407.08551

Zobrazit plný text záznamu

Report

VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

Autor: Han, Bing, Zhou, Long, Liu, Shujie, Chen, Sanyuan, Meng, Lingwei, Qian, Yanming, Liu, Yanqing, Zhao, Sheng, Li, Jinyu, Wei, Furu

With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing divers

Externí odkaz: http://arxiv.org/abs/2406.07855

Zobrazit plný text záznamu

Report

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Autor: Chen, Sanyuan, Liu, Shujie, Zhou, Long, Liu, Yanqing, Tan, Xu, Li, Jinyu, Zhao, Sheng, Qian, Yao, Wei, Furu

This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration

Externí odkaz: http://arxiv.org/abs/2406.05370

Zobrazit plný text záznamu

Report

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Autor: Hu, Shujie, Zhou, Long, Liu, Shujie, Chen, Sanyuan, Hao, Hongkun, Pan, Jing, Liu, Xunying, Li, Jinyu, Sivasankaran, Sunit, Liu, Linquan, Wei, Furu

The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilitie

Externí odkaz: http://arxiv.org/abs/2404.00656

Zobrazit plný text záznamu

Report

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Autor: Wang, Xiaofei, Thakker, Manthan, Chen, Zhuo, Kanda, Naoyuki, Eskimez, Sefik Emre, Chen, Sanyuan, Tang, Min, Liu, Shujie, Li, Jinyu, Yoshioka, Takuya

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generati

Externí odkaz: http://arxiv.org/abs/2308.06873

Zobrazit plný text záznamu

Report

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

Autor: Zhang, Ziqiang, Zhou, Long, Wang, Chengyi, Chen, Sanyuan, Wu, Yu, Liu, Shujie, Chen, Zhuo, Liu, Yanqing, Wang, Huaming, Li, Jinyu, He, Lei, Zhao, Sheng, Wei, Furu

We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis. Specifically, we extend VALL-E and train a multi-lingual conditional codec language model to predict the acoustic token sequences of the target lang

Externí odkaz: http://arxiv.org/abs/2303.03926

Zobrazit plný text záznamu

Report

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Autor: Wang, Chengyi, Chen, Sanyuan, Wu, Yu, Zhang, Ziqiang, Zhou, Long, Liu, Shujie, Chen, Zhuo, Liu, Yanqing, Wang, Huaming, Li, Jinyu, He, Lei, Zhao, Sheng, Wei, Furu

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called Vall-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a condit

Externí odkaz: http://arxiv.org/abs/2301.02111

Zobrazit plný text záznamu

Report

BEATs: Audio Pre-Training with Acoustic Tokenizers

Autor: Chen, Sanyuan, Wu, Yu, Wang, Chengyi, Liu, Shujie, Tompkins, Daniel, Chen, Zhuo, Wei, Furu

The massive growth of self-supervised learning (SSL) has been witnessed in language, vision, speech, and audio domains over the past few years. While discrete label prediction is widely adopted for other modalities, the state-of-the-art audio SSL mod

Externí odkaz: http://arxiv.org/abs/2212.09058

Zobrazit plný text záznamu

Report

TESSP: Text-Enhanced Self-Supervised Speech Pre-training

Autor: Yao, Zhuoyuan, Ren, Shuo, Chen, Sanyuan, Ma, Ziyang, Guo, Pengcheng, Xie, Lei

Self-supervised speech pre-training empowers the model with the contextual structure inherent in the speech signal while self-supervised text pre-training empowers the model with linguistic information. Both of them are beneficial for downstream spee

Externí odkaz: http://arxiv.org/abs/2211.13443

Zobrazit plný text záznamu

Report

Exploring WavLM on Speech Enhancement

Autor: Song, Hyungchan, Chen, Sanyuan, Chen, Zhuo, Wu, Yu, Yoshioka, Takuya, Tang, Min, Shin, Jong Won, Liu, Shujie

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To bette

Externí odkaz: http://arxiv.org/abs/2211.09988

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání