Výsledky vyhledávání - "Klejch, Ondrej"

Report

Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR

Autor: Minixhofer, Christoph, Klejch, Ondrej, Bell, Peter

Synthetically generated speech has rapidly approached human levels of naturalness. However, the paradox remains that ASR systems, when trained on TTS output that is judged as natural by humans, continue to perform badly on real speech. In this work,

Externí odkaz: http://arxiv.org/abs/2410.12279

Zobrazit plný text záznamu

Report

TTSDS -- Text-to-Speech Distribution Score

Autor: Minixhofer, Christoph, Klejch, Ondřej, Bell, Peter

Many recently published Text-to-Speech (TTS) systems produce audio close to real speech. However, TTS evaluation needs to be revisited to make sense of the results obtained with the new architectures, approaches and datasets. We propose evaluating th

Externí odkaz: http://arxiv.org/abs/2407.12707

Zobrazit plný text záznamu

Report

Speech collage: code-switched audio generation by collaging monolingual corpora

Autor: Hussein, Amir, Zeinali, Dorsa, Klejch, Ondřej, Wiesner, Matthew, Yan, Brian, Chowdhury, Shammur, Ali, Ahmed, Watanabe, Shinji, Khudanpur, Sanjeev

Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS dat

Externí odkaz: http://arxiv.org/abs/2309.15674

Zobrazit plný text záznamu

Report

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

Autor: Sanabria, Ramon, Klejch, Ondrej, Tang, Hao, Goldwater, Sharon

Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units. For unsupervised systems, these are mined using k-nearest neighbor (KNN) search, which is slow. Recently, mean-pooled representations from a

Externí odkaz: http://arxiv.org/abs/2306.02153

Zobrazit plný text záznamu

Report

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

Autor: Li, Yuanchao, Zhao, Zeyu, Klejch, Ondrej, Bell, Peter, Lai, Catherine

In Speech Emotion Recognition (SER), textual data is often used alongside audio signals to address their inherent variability. However, the reliance on human annotated text in most research hinders the development of practical SER systems. To overcom

Externí odkaz: http://arxiv.org/abs/2305.16065

Zobrazit plný text záznamu

Report

The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR

Autor: Sanabria, Ramon, Bogoychev, Nikolay, Markl, Nina, Carmantini, Andrea, Klejch, Ondrej, Bell, Peter

English is the most widely spoken language in the world, used daily by millions of people as a first or second language in many different contexts. As a result, there are many varieties of English. Although the great many advances in English automati

Externí odkaz: http://arxiv.org/abs/2303.18110

Zobrazit plný text záznamu

Report

Evaluating and reducing the distance between synthetic and real speech distributions

Autor: Minixhofer, Christoph, Klejch, Ondřej, Bell, Peter

While modern Text-to-Speech (TTS) systems can produce natural-sounding speech, they remain unable to reproduce the full diversity found in natural speech data. We consider the distribution of all possible real speech samples that could be generated b

Externí odkaz: http://arxiv.org/abs/2211.16049

Zobrazit plný text záznamu

Report

Towards Zero-Shot Code-Switched Speech Recognition

Autor: Yan, Brian, Wiesner, Matthew, Klejch, Ondrej, Jyothi, Preethi, Watanabe, Shinji

In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training. Previously proposed frameworks which conditionally facto

Externí odkaz: http://arxiv.org/abs/2211.01458

Zobrazit plný text záznamu

Report

Mask-combine Decoding and Classification Approach for Punctuation Prediction with real-time Inference Constraints

Autor: Minixhofer, Christoph, Klejch, Ondřej, Bell, Peter

In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows. We show that significant improvements ca

Externí odkaz: http://arxiv.org/abs/2112.08098

Zobrazit plný text záznamu

Report

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

Autor: Klejch, Ondrej, Wallington, Electra, Bell, Peter

We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question. Our approach uses a novel application of a decipherment a

Externí odkaz: http://arxiv.org/abs/2111.06799

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání