Zobrazeno 1 - 10
of 22
pro vyhledávání: '"Zhu, Zirun"'
Autor:
Eskimez, Sefik Emre, Wang, Xiaofei, Thakker, Manthan, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Yang, Hemin, Zhu, Zirun, Tang, Min, Tan, Xu, Liu, Yanqing, Zhao, Sheng, Kanda, Naoyuki
This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, th
Externí odkaz:
http://arxiv.org/abs/2406.18009
Autor:
Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Yang, Hemin, Zhu, Zirun, Tang, Min, Xia, Yufei, Li, Jinzhu, Zhao, Sheng, Li, Jinyu, Kanda, Naoyuki
Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt conta
Externí odkaz:
http://arxiv.org/abs/2406.05699
Autor:
Eskimez, Sefik Emre, Wang, Xiaofei, Thakker, Manthan, Tsai, Chung-Hsien, Li, Canrun, Xiao, Zhen, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Jinyu, Zhao, Sheng, Kanda, Naoyuki
Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker
Externí odkaz:
http://arxiv.org/abs/2406.04281
Autor:
Kanda, Naoyuki, Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Xia, Yufei, Li, Jinzhu, Liu, Yanqing, Zhao, Sheng, Zeng, Michael
Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their a
Externí odkaz:
http://arxiv.org/abs/2402.07383
Audio-visual speech enhancement (AV-SE) methods utilize auxiliary visual cues to enhance speakers' voices. Therefore, technically they should be able to outperform the audio-only speech enhancement (SE) methods. However, there are few works in the li
Externí odkaz:
http://arxiv.org/abs/2303.07005
Autor:
Yoshioka, Takuya, Wang, Xiaofei, Wang, Dongmei, Tang, Min, Zhu, Zirun, Chen, Zhuo, Kanda, Naoyuki
Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription. This paper proposes VarArray, an array-geometry-agnostic speech separation neural networ
Externí odkaz:
http://arxiv.org/abs/2110.05745
Autor:
Eskimez, Sefik Emre, Wang, Xiaofei, Tang, Min, Yang, Hemin, Zhu, Zirun, Chen, Zhuo, Wang, Huaming, Yoshioka, Takuya
With the surge of online meetings, it has become more critical than ever to provide high-quality speech audio and live captioning under various noise conditions. However, most monaural speech enhancement (SE) models introduce processing artifacts and
Externí odkaz:
http://arxiv.org/abs/2106.02896
Based on Foster et al.'s lenses, various bidirectional programming languages and systems have been developed for helping the user to write correct data synchronisers. The two well-behavedness laws of lenses, namely Correctness and Hippocraticness, ar
Externí odkaz:
http://arxiv.org/abs/2001.02031
Autor:
Shao, Xiaoliang, Liu, Yemin, Wang, Baolong, Li, Xianting, Chen, Jiujiu, Zhu, Zirun, Ma, Xiaojun
Publikováno v:
In Building and Environment 1 July 2023 239
Autor:
Zhu, Zirun1 (AUTHOR) zhu@nii.ac.jp, Ko, Hsiang-Shang2 (AUTHOR), Zhang, Yongzhe1 (AUTHOR), Martins, Pedro3 (AUTHOR), Saraiva, João4 (AUTHOR), Hu, Zhenjiang5 (AUTHOR)
Publikováno v:
New Generation Computing. Jul2020, Vol. 38 Issue 3, p423-476. 54p.