Complexity of the TDNN Acoustic Model with Respect to the HMM Topology

Autor: Aleš Pražák, Jan Vaněk, Josef Psutka
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: Text, Speech, and Dialogue ISBN: 9783030583224
TDS
Popis: In this paper, we discuss some of the properties of training acoustic models using a lattice-free version of the maximum mutual information criterion (LF-MMI). Currently, the LF-MMI method achieves state-of-the-art results on many speech recognition tasks. Some of the key features of the LF-MMI approach are: training DNN without initialization from a cross-entropy system, the use of a 3-fold reduced frame rate and the use of a simpler HMM topology. The conventional 3-state HMM topology was replaced in a typical LF-MMI training procedure with a special 1-stage HMM topology, that has different pdfs on the self-loop and forward transitions. In this paper, we would like to discuss both the different types of HMM topologies (conventional 1-, 2- and 3-state HMM topology) and the advantages of using biphone context modeling over using the original triphone or a simpler monophone context. We would also like to mention the impact of the subsampling factor to WER.
Databáze: OpenAIRE