A Compact Representation of Pronunciation Lexicons Using Finite-state Super Transducers
Autor: | Žiga Golob, Boštjan Vesnicer, Jerneja Žganec Gros, Mario Žganec, Simon Dobrišek |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
lcsh:Philology. Linguistics
pronunciation dictionary speech synthesis lcsh:P1-1091 Computer Science::Sound Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) finite-state transducers Computer Science::Formal Languages and Automata Theory |
Zdroj: | Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave, Vol 4, Iss 1, Pp 79-96 (2017) |
ISSN: | 2335-2736 |
Popis: | Computer models based on finite-state transducers are well suited for compact representations of pronunciation lexicons that are used both in speech synthesis as well as in speech recognition. In this paper, we present a finite-state super transducer, which is a new type of finite state transducer that allows the representation of a pronunciation lexicon with fewer states and transitions than using a conventional minimized and determinized finite-state transducer. A finite-state super transducer is a deterministic transducer that can, in addition to the words comprised in the pronunciation lexicon, accept some other, out-of-dictionary words as well. The resulting allophone transcription for these words can be erroneous, but we demonstrate that such errors are comparable to the performance of state-of-the-art methods for grapheme-to-phoneme conversion. The procedure for building finite-state super transducers and a validation of their performance is demonstrated on the SI-PRON pronunciation lexicon. In addition, we also analyze several properties of finite-state transducers with respect to their minimum size obtained by their determinization and minimization. We show that for highly inflected languages their minimum size begins to decrease when the number of words in the represented pronunciation dictionary reaches a certain threshold. |
Databáze: | OpenAIRE |
Externí odkaz: |