Zobrazeno 1 - 10
of 311
pro vyhledávání: '"Heiga Zen"'
Autor:
Abhayjeet Singh, Amala Nagireddi, Anjali Jayakumar, Deekshitha G, Jesuraja Bandekar, Roopa R, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, Hema A Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira Chandra Singh, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich
Publikováno v:
IEEE Open Journal of Signal Processing, Vol 5, Pp 790-798 (2024)
The Lightweight, Multi-speaker, Multi-lingual Indic Text-to-Speech (LIMMITS'23) challenge is organized as part of the ICASSP 2023 Signal Processing Grand Challenge. LIMMITS'23 aims at the development of a lightweight, multi-speaker, multi-lingual Tex
Externí odkaz:
https://doaj.org/article/30416475bd804b8ea7e7a6b9878069a3
Publikováno v:
IEEE Transactions on Affective Computing. 14:3-5
The papers in this special section focus on affective speech and language synthesis, generation, and conversion. As an inseparable and crucial part of spoken language, emotions play a substantial role in human-human and human-technology conversation.
Denoising diffusion probabilistic models (DDPMs) and generative adversarial networks (GANs) are popular generative models for neural vocoders. The DDPMs and GANs can be characterized by the iterative denoising framework and adversarial training, resp
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::806a1952bf9d1889dbb9e0babbc78004
http://arxiv.org/abs/2210.01029
http://arxiv.org/abs/2210.01029
Autor:
Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alex Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Robert Clark
Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of the challeng
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9b4ec8728b0902a310793207cc888c8a
http://arxiv.org/abs/2208.13183
http://arxiv.org/abs/2208.13183
Publikováno v:
Information Processing & Management. 60:103249
Autor:
Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran
This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models. Existing multilingual TTS typically supports tens of languages, which are a small fraction of the t
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::52500f6059f16130fbb660682cd9fdfb
Autor:
Bhuvana Ramabhadran, Mohammadreza Ghodsi, Yinghui Huang, Heiga Zen, Pedro J. Moreno, Yu Zhang, Zhehuai Chen, Andrew Rosenberg, Jesse Emond, Gary Wang
Publikováno v:
Interspeech 2021.
Publikováno v:
Interspeech 2021.
This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and
Publikováno v:
ICASSP
Although neural end-to-end text-to-speech models can synthesize highly natural speech, there is still room for improvements to its efficiency and naturalness. This paper proposes a non-autoregressive neural text-to-speech model augmented with a varia
Autor:
Michael L. Seltzer, Reinhold Haeb-Umbach, Shinji Watanabe, Bjorn Hoffmeister, Heiga Zen, Michiel Bacchiani, Mehrez Souden, Tomohiro Nakatani
Publikováno v:
IEEE Signal Processing Magazine. 36:111-124
Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital home assistants with a spoken language interface have become a ubiquitous commodity today. This success has been made possible by major advancements in si