Výsledky vyhledávání - "Xie, Fenglong"

Report

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation

Autor: Guo, Haohan, Xie, Fenglong, Yang, Dongchao, Wu, Xixin, Meng, Helen

The neural codec language model (CLM) has demonstrated remarkable performance in text-to-speech (TTS) synthesis. However, troubled by ``recency bias", CLM lacks sufficient attention to coarse-grained information at a higher temporal scale, often prod

Externí odkaz: http://arxiv.org/abs/2409.11630

Zobrazit plný text záznamu

Report

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis

Autor: Guo, Haohan, Xie, Fenglong, Xie, Kun, Yang, Dongchao, Guo, Dake, Wu, Xixin, Meng, Helen

The long speech sequence has been troubling language models (LM) based TTS approaches in terms of modeling complexity and efficiency. This work proposes SoCodec, a semantic-ordered multi-stream speech codec, to address this issue. It compresses speec

Externí odkaz: http://arxiv.org/abs/2409.00933

Zobrazit plný text záznamu

Report

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

Autor: Guo, Haohan, Xie, Fenglong, Yang, Dongchao, Lu, Hui, Wu, Xixin, Meng, Helen

VQ-VAE, as a mainstream approach of speech tokenizer, has been troubled by ``index collapse'', where only a small number of codewords are activated in large codebooks. This work proposes product-quantized (PQ) VAE with more codebooks but fewer codewo

Externí odkaz: http://arxiv.org/abs/2406.02940

Zobrazit plný text záznamu

Report

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

Autor: Guo, Haohan, Xie, Fenglong, Kang, Jiawen, Xiao, Yujia, Wu, Xixin, Meng, Helen

This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS quality with lower supervised data requirements via Vector-Quantized Self-Supervised Speech Representation Learning (VQ-S3RL) utilizing more unlabeled speech audio. Thi

Externí odkaz: http://arxiv.org/abs/2309.00126

Zobrazit plný text záznamu

Report

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

Autor: Guo, Haohan, Xie, Fenglong, Wu, Xixin, Lu, Hui, Meng, Helen

This paper aims to enhance low-resource TTS by reducing training data requirements using compact speech representations. A Multi-Stage Multi-Codebook (MSMC) VQ-GAN is trained to learn the representation, MSMCR, and decode it to waveforms. Subsequentl

Externí odkaz: http://arxiv.org/abs/2210.15131

Zobrazit plný text záznamu

Report

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

Autor: Guo, Haohan, Xie, Fenglong, Soong, Frank K., Wu, Xixin, Meng, Helen

We propose a Multi-Stage, Multi-Codebook (MSMC) approach to high-performance neural TTS synthesis. A vector-quantized, variational autoencoder (VQ-VAE) based feature analyzer is used to encode Mel spectrograms of speech training data by down-sampling

Externí odkaz: http://arxiv.org/abs/2209.10887

Zobrazit plný text záznamu

Report

Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS

Autor: Lin, Shilun, Su, Wenchao, Meng, Li, Xie, Fenglong, Li, Xinhui, Lu, Li

This paper presents Nana-HDR, a new non-attentive non-autoregressive model with hybrid Transformer-based Dense-fuse encoder and RNN-based decoder for TTS. It mainly consists of three parts: Firstly, a novel Dense-fuse encoder with dense connections b

Externí odkaz: http://arxiv.org/abs/2109.13673

Zobrazit plný text záznamu

Akademický článek

Wearable activity tracker study exploring rheumatoid arthritis patients’ disease activity using patient-reported outcome measures, clinical measures, and biometric sensor data (the wear study)

Autor: Stradford, Laura, Curtis, Jeffrey R., Zueger, Patrick, Xie, Fenglong, Curtis, David, Gavigan, Kelly, Clinton, Cassie, Venkatachalam, Shilpa, Rivera, Esteban, Nowell, W. Benjamin

Publikováno v: In Contemporary Clinical Trials Communications April 2024 38

Zobrazit plný text záznamu

Akademický článek

Tri-stage training with language-specific encoder and bilingual acoustic learner for code-switching speech recognition

Autor: Wang, Xuefei, Jin, Yuan, Xie, Fenglong, Long, Yanhua

Publikováno v: In Applied Acoustics 15 March 2024 218

Zobrazit plný text záznamu

Report

Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet

Autor: Lin, Shilun, Xie, Fenglong, Meng, Li, Li, Xinhui, Lu, Li

In this work, a robust and efficient text-to-speech (TTS) synthesis system named Triple M is proposed for large-scale online application. The key components of Triple M are: 1) A sequence-to-sequence model adopts a novel multi-guidance attention to t

Externí odkaz: http://arxiv.org/abs/2102.00247

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání