A study on adapting Czech automatic speech recognition system to Croatian language.

Autor: Nouza, Jan, Cerva, Petr, Zdansky, Jindrich, Kucharova, Michaela
Zdroj: Proceedings ELMAR-2012; 1/ 1/2012, p227-230, 4p
Abstrakt: After successful adaptation of our Czech large-vocabulary speech recognition system to Slovak, we investigate the possibility to port it to another Slavic language, Croatian in this case. We describe how we build a large lexicon (recently with 255 thousand entries) and a language model from publicly available Internet sources and how an existing Czech acoustic model (AM) can be utilized for bootstrapping and training a model applicable for Croatian. For the AM adaptation we use the Croatian part of the GlobalPhone database. An independent evaluation is done on a test set made of transcribed broadcast recordings of Radio Pula. When using the original Czech acoustic model, the word error rate is 27.6 %, with the model adapted to Croatian, it is reduced to 19.4 %. [ABSTRACT FROM PUBLISHER]
Databáze: Complementary Index