Zobrazeno 1 - 10
of 19
pro vyhledávání: '"Huybrechts, Goeric"'
Autor:
Peri, Raghuveer, Jayanthi, Sai Muralidhar, Ronanki, Srikanth, Bhatia, Anshu, Mundnich, Karel, Dingliwal, Saket, Das, Nilaksh, Hou, Zejiang, Huybrechts, Goeric, Vishnubhotla, Srikanth, Garcia-Romero, Daniel, Srinivasan, Sundararajan, Han, Kyu J, Kirchhoff, Katrin
Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we in
Externí odkaz:
http://arxiv.org/abs/2405.08317
Autor:
Huybrechts, Goeric, Ronanki, Srikanth, Li, Xilai, Nosrati, Hadis, Bodapati, Sravan, Kirchhoff, Katrin
Conformer-based end-to-end models have become ubiquitous these days and are commonly used in both streaming and non-streaming automatic speech recognition (ASR). Techniques like dual-mode and dynamic chunk training helped unify streaming and non-stre
Externí odkaz:
http://arxiv.org/abs/2306.08175
Recently, there has been an increasing interest in unifying streaming and non-streaming speech recognition models to reduce development, training and deployment cost. The best-known approaches rely on either window-based or dynamic chunk-based attent
Externí odkaz:
http://arxiv.org/abs/2304.09325
The availability of data in expressive styles across languages is limited, and recording sessions are costly and time consuming. To overcome these issues, we demonstrate how to build low-resource, neural text-to-speech (TTS) voices with only 1 hour o
Externí odkaz:
http://arxiv.org/abs/2207.14607
Autor:
Gabryś, Adam, Huybrechts, Goeric, Ribeiro, Manuel Sam, Chien, Chung-Ming, Roth, Julian, Comini, Giulia, Barra-Chicote, Roberto, Perz, Bartek, Lorenzo-Trueba, Jaime
State-of-the-art text-to-speech (TTS) systems require several hours of recorded speech data to generate high-quality synthetic speech. When using reduced amounts of training data, standard TTS models suffer from speech quality and intelligibility deg
Externí odkaz:
http://arxiv.org/abs/2202.08164
Autor:
Ribeiro, Manuel Sam, Roth, Julian, Comini, Giulia, Huybrechts, Goeric, Gabrys, Adam, Lorenzo-Trueba, Jaime
We address the problem of cross-speaker style transfer for text-to-speech (TTS) using data augmentation via voice conversion. We assume to have a corpus of neutral non-expressive data from a target speaker and supporting conversational expressive dat
Externí odkaz:
http://arxiv.org/abs/2202.05083
Autor:
Shah, Raahil, Pokora, Kamil, Ezzerg, Abdelhamid, Klimkov, Viacheslav, Huybrechts, Goeric, Putrycz, Bartosz, Korzekwa, Daniel, Merritt, Thomas
Whilst recent neural text-to-speech (TTS) approaches produce high-quality speech, they typically require a large amount of recordings from the target speaker. In previous work, a 3-step method was proposed to generate high-quality TTS while greatly r
Externí odkaz:
http://arxiv.org/abs/2106.12896
Emotional voice conversion models adapt the emotion in speech without changing the speaker identity or linguistic content. They are less data hungry than text-to-speech models and allow to generate large amounts of emotional data for downstream tasks
Externí odkaz:
http://arxiv.org/abs/2101.05695
Autor:
Huybrechts, Goeric, Merritt, Thomas, Comini, Giulia, Perz, Bartek, Shah, Raahil, Lorenzo-Trueba, Jaime
While recent neural text-to-speech (TTS) systems perform remarkably well, they typically require a substantial amount of recordings from the target speaker reading in the desired speaking style. In this work, we present a novel 3-step methodology to
Externí odkaz:
http://arxiv.org/abs/2011.05707
We present an approach to synthesize whisper by applying a handcrafted signal processing recipe and Voice Conversion (VC) techniques to convert normally phonated speech to whispered speech. We investigate using Gaussian Mixture Models (GMM) and Deep
Externí odkaz:
http://arxiv.org/abs/1912.05289