Výsledky vyhledávání - "Thakker, Manthan"

Report

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Autor: Eskimez, Sefik Emre, Wang, Xiaofei, Thakker, Manthan, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Yang, Hemin, Zhu, Zirun, Tang, Min, Tan, Xu, Liu, Yanqing, Zhao, Sheng, Kanda, Naoyuki

This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, th

Externí odkaz: http://arxiv.org/abs/2406.18009

Zobrazit plný text záznamu

Report

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

Autor: Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Yang, Hemin, Zhu, Zirun, Tang, Min, Xia, Yufei, Li, Jinzhu, Zhao, Sheng, Li, Jinyu, Kanda, Naoyuki

Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt conta

Externí odkaz: http://arxiv.org/abs/2406.05699

Zobrazit plný text záznamu

Report

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Autor: Eskimez, Sefik Emre, Wang, Xiaofei, Thakker, Manthan, Tsai, Chung-Hsien, Li, Canrun, Xiao, Zhen, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Jinyu, Zhao, Sheng, Kanda, Naoyuki

Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker

Externí odkaz: http://arxiv.org/abs/2406.04281

Zobrazit plný text záznamu

Report

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Autor: Kanda, Naoyuki, Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Xia, Yufei, Li, Jinzhu, Liu, Yanqing, Zhao, Sheng, Zeng, Michael

Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their a

Externí odkaz: http://arxiv.org/abs/2402.07383

Zobrazit plný text záznamu

Report

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Autor: Wang, Xiaofei, Thakker, Manthan, Chen, Zhuo, Kanda, Naoyuki, Eskimez, Sefik Emre, Chen, Sanyuan, Tang, Min, Liu, Shujie, Li, Jinyu, Yoshioka, Takuya

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generati

Externí odkaz: http://arxiv.org/abs/2308.06873

Zobrazit plný text záznamu

Report

Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation

Autor: Thakker, Manthan, Eskimez, Sefik Emre, Yoshioka, Takuya, Wang, Huaming

This paper investigates how to improve the runtime speed of personalized speech enhancement (PSE) networks while maintaining the model quality. Our approach includes two aspects: architecture and knowledge distillation (KD). We propose an end-to-end

Externí odkaz: http://arxiv.org/abs/2204.00771

Zobrazit plný text záznamu

Report

ICASSP 2022 Deep Noise Suppression Challenge

Autor: Dubey, Harishchandra, Gopal, Vishak, Cutler, Ross, Aazami, Ashkan, Matusevych, Sergiy, Braun, Sebastian, Eskimez, Sefik Emre, Thakker, Manthan, Yoshioka, Takuya, Gamper, Hannes, Aichner, Robert

The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. This is the 4th DNS challenge, with the previous editions held at INTERSPEECH 2020, ICASSP 202

Externí odkaz: http://arxiv.org/abs/2202.13288

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání