Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Klejch, Ondrej"'
Synthetically generated speech has rapidly approached human levels of naturalness. However, the paradox remains that ASR systems, when trained on TTS output that is judged as natural by humans, continue to perform badly on real speech. In this work,
Externí odkaz:
http://arxiv.org/abs/2410.12279
Many recently published Text-to-Speech (TTS) systems produce audio close to real speech. However, TTS evaluation needs to be revisited to make sense of the results obtained with the new architectures, approaches and datasets. We propose evaluating th
Externí odkaz:
http://arxiv.org/abs/2407.12707
Autor:
Hussein, Amir, Zeinali, Dorsa, Klejch, Ondřej, Wiesner, Matthew, Yan, Brian, Chowdhury, Shammur, Ali, Ahmed, Watanabe, Shinji, Khudanpur, Sanjeev
Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS dat
Externí odkaz:
http://arxiv.org/abs/2309.15674
Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units. For unsupervised systems, these are mined using k-nearest neighbor (KNN) search, which is slow. Recently, mean-pooled representations from a
Externí odkaz:
http://arxiv.org/abs/2306.02153
In Speech Emotion Recognition (SER), textual data is often used alongside audio signals to address their inherent variability. However, the reliance on human annotated text in most research hinders the development of practical SER systems. To overcom
Externí odkaz:
http://arxiv.org/abs/2305.16065
Autor:
Sanabria, Ramon, Bogoychev, Nikolay, Markl, Nina, Carmantini, Andrea, Klejch, Ondrej, Bell, Peter
English is the most widely spoken language in the world, used daily by millions of people as a first or second language in many different contexts. As a result, there are many varieties of English. Although the great many advances in English automati
Externí odkaz:
http://arxiv.org/abs/2303.18110
While modern Text-to-Speech (TTS) systems can produce natural-sounding speech, they remain unable to reproduce the full diversity found in natural speech data. We consider the distribution of all possible real speech samples that could be generated b
Externí odkaz:
http://arxiv.org/abs/2211.16049
In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training. Previously proposed frameworks which conditionally facto
Externí odkaz:
http://arxiv.org/abs/2211.01458
In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows. We show that significant improvements ca
Externí odkaz:
http://arxiv.org/abs/2112.08098
We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question. Our approach uses a novel application of a decipherment a
Externí odkaz:
http://arxiv.org/abs/2111.06799