Discriminative and Articulatory Feature-based Pronunciation Models for Conversational Speech Recognition

Autor: Jyothi, Preethi
Jazyk: angličtina
Rok vydání: 2013
Předmět:
Druh dokumentu: Text
Popis: Conversational (or spontaneous) speech is characterized by a great deal of pronunciation variability caused by various artifacts of natural speech (like the speaking style, speaking rate and surrounding words). This pronunciation variability is one of the main factors resulting in the poor performance of automatic speech recognition (ASR) systems on conversational speech. The main goal of this dissertation is to combine linguistic insights from speech production models and discriminative training techniques to tackle such variability. Also, for this purpose, we introduce two new techniques which are of potentially more general interest.We build on two popular tools that are used in ASR: dynamic Bayesian networks (DBNs) and weighted finite state transducers (WFSTs). First, we propose a discriminative training approach that allows selective training of WFST factors within an ASR system based on WFSTs. Second, we propose a technique to transform DBN models into equivalent WFST cascades that can be further discriminatively trained using our first approach.We employ the above approaches for building improved pronunciation models. We start with a prior DBN model of pronunciation that relates the movements of various articulators of a speaker to the sounds produced. We transform this model into a cascade of WFSTs that are further discriminatively trained to produce a better pronunciation model. We experimentally verify the improved performance by evaluating this model in isolation in a lexical access task. Going beyond isolated evaluation, we also demonstrate the feasibility of our discriminative training technique in an end-to-end recognizer with a phone-based pronunciation model.Finally, we show that the prior DBN model can also be improved significantly by incorporating contextual information. Such pronunciation models with context are also amenable to being incorporated into a WFST-based ASR system using our new discriminative techniques.
Databáze: Networked Digital Library of Theses & Dissertations