HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch

Autor:	Raissi, Tina, Zhou, Wei, Berger, Simon, Schlüter, Ralf, Ney, Hermann
Rok vydání:	2022
Předmět:	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	In this work, we compare from-scratch sequence-level cross-entropy (full-sum) training of Hidden Markov Model (HMM) and Connectionist Temporal Classification (CTC) topologies for automatic speech recognition (ASR). Besides accuracy, we further analyze their capability for generating high-quality time alignment between the speech signal and the transcription, which can be crucial for many subsequent applications. Moreover, we propose several methods to improve convergence of from-scratch full-sum training by addressing the alignment modeling issue. Systematic comparison is conducted on both Switchboard and LibriSpeech corpora across CTC, posterior HMM with and w/o transition probabilities, and standard hybrid HMM. We also provide a detailed analysis of both Viterbi forced-alignment and Baum-Welch full-sum occupation probabilities. Comment: Accepted for Presentation at IEEE SLT 2022
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2210.09951 Zobrazit plný text záznamu View this record from Arxiv