Výsledky vyhledávání

Report

Multi-modal Adversarial Training for Zero-Shot Voice Cloning

Autor: Janiczek, John, Chong, Dading, Dai, Dongyang, Faria, Arlo, Wang, Chao, Wang, Tao, Liu, Yuzong

A text-to-speech (TTS) model trained to reconstruct speech given text tends towards predictions that are close to the average characteristics of a dataset, failing to model the variations that make human speech sound natural. This problem is magnifie

Externí odkaz: http://arxiv.org/abs/2408.15916

Zobrazit plný text záznamu

Report

Toward Zero Oracle Word Error Rate on the Switchboard Benchmark

Autor: Faria, Arlo, Janin, Adam, Riedhammer, Korbinian, Adkoli, Sidhi

The "Switchboard benchmark" is a very well-known test set in automatic speech recognition (ASR) research, establishing record-setting performance for systems that claim human-level transcription accuracy. This work highlights lesser-known practical c

Externí odkaz: http://arxiv.org/abs/2206.06192

Zobrazit plný text záznamu

Efficient Pitch-based Estimation of VTLNWarp Factors

Autor: Faria, Arlo, Gelbart, David

To reduce inter-speaker variability, vocal tract length normalization (VTLN) is commonly used to transform acoustic features for automatic speech recognition (ASR). The warp factors used in this process are usually derived by maximum likelihood (ML)

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=od_______463::b89a9dd6feaa249f7e4311dc437ffd64
http://hdl.handle.net/1842/1042

Zobrazit plný text záznamu

Conference

The TAO of ATWV: Probing the mysteries of keyword search performance.

Autor: Wegmann, Steven, Faria, Arlo, Janin, Adam, Riedhammer, Korbinian, Morgan, Nelson

Publikováno v: 2013 IEEE Workshop on Automatic Speech Recognition & Understanding; 2013, p192-197, 6p

Zobrazit plný text záznamu

Conference

When a mismatch can be good.

Autor: Faria, Arlo, Morgan, Nelson

Publikováno v: Proceedings of the 2008 ACM Symposium: Applied Computing; 3/16/2008, p1574-1577, 4p

Zobrazit plný text záznamu

Kniha

Accent Classification for Speech Recognition.

Autor: Renals, Steve, Bengio, Samy, Faria, Arlo

Publikováno v: Machine Learning for Multimodal Interaction (9783540325499); 2006, p285-293, 9p

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Building A Highly Accurate Mandarin Speech Recognizer With Language-Independent Technologies and Language-Dependent Modules.

Autor: Mei-Yuh Hwang, Gang Peng, Ostendorf, Mari, Wen Wang, Faria, Arlo, Heidel, Aaron

Publikováno v: IEEE Transactions on Audio, Speech & Language Processing; Sep2009, Vol. 17 Issue 7, p1253-1262, 10p, 3 Diagrams, 13 Charts

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání