Výsledky vyhledávání

Report

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Autor: Eskimez, Sefik Emre, Wang, Xiaofei, Thakker, Manthan, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Yang, Hemin, Zhu, Zirun, Tang, Min, Tan, Xu, Liu, Yanqing, Zhao, Sheng, Kanda, Naoyuki

This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, th

Externí odkaz: http://arxiv.org/abs/2406.18009

Zobrazit plný text záznamu

Report

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

Autor: Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Yang, Hemin, Zhu, Zirun, Tang, Min, Xia, Yufei, Li, Jinzhu, Zhao, Sheng, Li, Jinyu, Kanda, Naoyuki

Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt conta

Externí odkaz: http://arxiv.org/abs/2406.05699

Zobrazit plný text záznamu

Report

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Autor: Eskimez, Sefik Emre, Wang, Xiaofei, Thakker, Manthan, Tsai, Chung-Hsien, Li, Canrun, Xiao, Zhen, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Jinyu, Zhao, Sheng, Kanda, Naoyuki

Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker

Externí odkaz: http://arxiv.org/abs/2406.04281

Zobrazit plný text záznamu

Report

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Autor: Kanda, Naoyuki, Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Xia, Yufei, Li, Jinzhu, Liu, Yanqing, Zhao, Sheng, Zeng, Michael

Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their a

Externí odkaz: http://arxiv.org/abs/2402.07383

Zobrazit plný text záznamu

Report

Real-Time Audio-Visual End-to-End Speech Enhancement

Autor: Zhu, Zirun, Yang, Hemin, Tang, Min, Yang, Ziyi, Eskimez, Sefik Emre, Wang, Huaming

Audio-visual speech enhancement (AV-SE) methods utilize auxiliary visual cues to enhance speakers' voices. Therefore, technically they should be able to outperform the audio-only speech enhancement (SE) methods. However, there are few works in the li

Externí odkaz: http://arxiv.org/abs/2303.07005

Zobrazit plný text záznamu

Report

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Autor: Yoshioka, Takuya, Wang, Xiaofei, Wang, Dongmei, Tang, Min, Zhu, Zirun, Chen, Zhuo, Kanda, Naoyuki

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription. This paper proposes VarArray, an array-geometry-agnostic speech separation neural networ

Externí odkaz: http://arxiv.org/abs/2110.05745

Zobrazit plný text záznamu

Report

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement

Autor: Eskimez, Sefik Emre, Wang, Xiaofei, Tang, Min, Yang, Hemin, Zhu, Zirun, Chen, Zhuo, Wang, Huaming, Yoshioka, Takuya

With the surge of online meetings, it has become more critical than ever to provide high-quality speech audio and live captioning under various noise conditions. However, most monaural speech enhancement (SE) models introduce processing artifacts and

Externí odkaz: http://arxiv.org/abs/2106.02896

Zobrazit plný text záznamu

Report

Retentive Lenses

Autor: Zhu, Zirun, Yang, Zhixuan, Ko, Hsiang-Shang, Hu, Zhenjiang

Based on Foster et al.'s lenses, various bidirectional programming languages and systems have been developed for helping the user to write correct data synchronisers. The two well-behavedness laws of lenses, namely Correctness and Hippocraticness, ar

Externí odkaz: http://arxiv.org/abs/2001.02031

Zobrazit plný text záznamu

Akademický článek

Fast regulation of multi-position differentiated environment: Multi-step joint optimization of air supply parameters

Autor: Shao, Xiaoliang, Liu, Yemin, Wang, Baolong, Li, Xianting, Chen, Jiujiu, Zhu, Zirun, Ma, Xiaojun

Publikováno v: In Building and Environment 1 July 2023 239

Zobrazit plný text záznamu

Akademický článek

Unifying Parsing and Reflective Printing for Fully Disambiguated Grammars.

Autor: Zhu, Zirun¹ (AUTHOR) zhu@nii.ac.jp, Ko, Hsiang-Shang² (AUTHOR), Zhang, Yongzhe¹ (AUTHOR), Martins, Pedro³ (AUTHOR), Saraiva, João⁴ (AUTHOR), Hu, Zhenjiang⁵ (AUTHOR)

Publikováno v: New Generation Computing. Jul2020, Vol. 38 Issue 3, p423-476. 54p.

Zobrazit plný text záznamu

Plný text ve formátu HTML

Vyhledávací nástroje:

Upřesnit hledání