Complexity of the TDNN Acoustic Model with Respect to the HMM Topology
Autor: | Aleš Pražák, Jan Vaněk, Josef Psutka |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Context model
Time delay neural network Computer science Computer Science::Sound Acoustic model Speech recognition Acoustic modeling HMM topology Lattice-free MMI Context (language use) Topology (electrical circuits) Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) Network topology Hidden Markov model Topology Triphone |
Zdroj: | Text, Speech, and Dialogue ISBN: 9783030583224 TDS |
Popis: | In this paper, we discuss some of the properties of training acoustic models using a lattice-free version of the maximum mutual information criterion (LF-MMI). Currently, the LF-MMI method achieves state-of-the-art results on many speech recognition tasks. Some of the key features of the LF-MMI approach are: training DNN without initialization from a cross-entropy system, the use of a 3-fold reduced frame rate and the use of a simpler HMM topology. The conventional 3-state HMM topology was replaced in a typical LF-MMI training procedure with a special 1-stage HMM topology, that has different pdfs on the self-loop and forward transitions. In this paper, we would like to discuss both the different types of HMM topologies (conventional 1-, 2- and 3-state HMM topology) and the advantages of using biphone context modeling over using the original triphone or a simpler monophone context. We would also like to mention the impact of the subsampling factor to WER. |
Databáze: | OpenAIRE |
Externí odkaz: |