Výsledky vyhledávání

Report

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Autor: Li, Jiaqi, Wang, Dongmei, Wang, Xiaofei, Qian, Yao, Zhou, Long, Liu, Shujie, Yousefi, Midia, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Liu, Yanqing, Chen, Junkun, Zhao, Sheng, Li, Jinyu, Wu, Zhizheng, Zeng, Michael

Neural audio codec tokens serve as the fundamental building blocks for speech language model (SLM)-based speech generation. However, there is no systematic understanding on how the codec system affects the speech generation performance of the SLM. In

Externí odkaz: http://arxiv.org/abs/2409.04016

Zobrazit plný text záznamu

Report

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

Autor: Wu, Haibin, Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Tompkins, Daniel, Tsai, Chung-Hsien, Li, Canrun, Xiao, Zhen, Zhao, Sheng, Li, Jinyu, Kanda, Naoyuki

People change their tones of voice, often accompanied by nonverbal vocalizations (NVs) such as laughter and cries, to convey rich emotions. However, most text-to-speech (TTS) systems lack the capability to generate speech with rich emotions, includin

Externí odkaz: http://arxiv.org/abs/2407.12229

Zobrazit plný text záznamu

Report

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Autor: Eskimez, Sefik Emre, Wang, Xiaofei, Thakker, Manthan, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Yang, Hemin, Zhu, Zirun, Tang, Min, Tan, Xu, Liu, Yanqing, Zhao, Sheng, Kanda, Naoyuki

This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, th

Externí odkaz: http://arxiv.org/abs/2406.18009

Zobrazit plný text záznamu

Report

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Autor: Eskimez, Sefik Emre, Wang, Xiaofei, Thakker, Manthan, Tsai, Chung-Hsien, Li, Canrun, Xiao, Zhen, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Jinyu, Zhao, Sheng, Kanda, Naoyuki

Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker

Externí odkaz: http://arxiv.org/abs/2406.04281

Zobrazit plný text záznamu

Report

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Autor: Kanda, Naoyuki, Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Xia, Yufei, Li, Jinzhu, Liu, Yanqing, Zhao, Sheng, Zeng, Michael

Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their a

Externí odkaz: http://arxiv.org/abs/2402.07383

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání