A Compact Representation of Pronunciation Lexicons Using Finite-state Super Transducers

Autor: Žiga Golob, Boštjan Vesnicer, Jerneja Žganec Gros, Mario Žganec, Simon Dobrišek
Jazyk: angličtina
Rok vydání: 2017
Předmět:
Zdroj: Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave, Vol 4, Iss 1, Pp 79-96 (2017)
ISSN: 2335-2736
Popis: Computer models based on finite-state transducers are well suited for compact representations of pronunciation lexicons that are used both in speech synthesis as well as in speech recognition. In this paper, we present a finite-state super transducer, which is a new type of finite state transducer that allows the representation of a pronunciation lexicon with fewer states and transitions than using a conventional minimized and determinized finite-state transducer. A finite-state super transducer is a deterministic transducer that can, in addition to the words comprised in the pronunciation lexicon, accept some other, out-of-dictionary words as well. The resulting allophone transcription for these words can be erroneous, but we demonstrate that such errors are comparable to the performance of state-of-the-art methods for grapheme-to-phoneme conversion. The procedure for building finite-state super transducers and a validation of their performance is demonstrated on the SI-PRON pronunciation lexicon. In addition, we also analyze several properties of finite-state transducers with respect to their minimum size obtained by their determinization and minimization. We show that for highly inflected languages their minimum size begins to decrease when the number of words in the represented pronunciation dictionary reaches a certain threshold.
Databáze: OpenAIRE