Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

Autor:	Li, Hanzhao, Xue, Liumeng, Guo, Haohan, Zhu, Xinfa, Lv, Yuanjun, Xie, Lei, Chen, Yunlin, Yin, Hao, Li, Zhifei
Rok vydání:	2024
Předmět:	Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermore, the encoder is enhanced with 1) contextual modeling with a BLSTM module to exploit the temporal information, 2) a hybrid sampling module to alleviate distortion from upsampling and downsampling, and 3) a resampling module to encourage discrete units to carry more phonetic information. Compared with multi-codebook codecs, e.g., EnCodec and TiCodec, Single-Codec demonstrates higher reconstruction quality with a lower bandwidth of only 304bps. The effectiveness of Single-Code is further validated by LLM-TTS experiments, showing improved naturalness and intelligibility. Comment: Accepted by Interspeech 2024
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2406.07422 Zobrazit plný text záznamu View this record from Arxiv