Complexity of the TDNN Acoustic Model with Respect to the HMM Topology

Autor:	Aleš Pražák, Jan Vaněk, Josef Psutka
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Context model Time delay neural network Computer science Computer Science::Sound Acoustic model Speech recognition Acoustic modeling HMM topology Lattice-free MMI Context (language use) Topology (electrical circuits) Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) Network topology Hidden Markov model Topology Triphone
Zdroj:	Text, Speech, and Dialogue ISBN: 9783030583224 TDS
Popis:	In this paper, we discuss some of the properties of training acoustic models using a lattice-free version of the maximum mutual information criterion (LF-MMI). Currently, the LF-MMI method achieves state-of-the-art results on many speech recognition tasks. Some of the key features of the LF-MMI approach are: training DNN without initialization from a cross-entropy system, the use of a 3-fold reduced frame rate and the use of a simpler HMM topology. The conventional 3-state HMM topology was replaced in a typical LF-MMI training procedure with a special 1-stage HMM topology, that has different pdfs on the self-loop and forward transitions. In this paper, we would like to discuss both the different types of HMM topologies (conventional 1-, 2- and 3-state HMM topology) and the advantages of using biphone context modeling over using the original triphone or a simpler monophone context. We would also like to mention the impact of the subsampling factor to WER.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3e0954787af912082d6201b9eec10577 http://hdl.handle.net/11025/42718 Zobrazit plný text záznamu