Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition

Autor:	Jeremy H. M. Wong, Mark J. F. Gales, Xie Chen, Xunying Liu, Anton Ragni, Yu Wang
Přispěvatelé:	Chen, X [0000-0001-7423-617X], Liu, X [0000-0001-6725-1160], Wang, Y [0000-0001-9500-081X], Wong, JHM [0000-0003-3742-7510], Apollo - University of Cambridge Repository
Rok vydání:	2019
Předmět:	Context model Acoustics and Ultrasonics Computer science Speech recognition Recurrent neural network speech recognition Conditional probability Speech processing keyword search Computational Mathematics succeeding words Keyword spotting Computer Science (miscellaneous) language model Language model Electrical and Electronic Engineering Sentence Word (computer architecture)
ISSN:	2329-9290
DOI:	10.17863/cam.40469
Popis:	Language modelling is a crucial component in a wide range of applications including speech recognition. Language models (LMs) are usually constructed by splitting a sentence into words and computing the probability of a word based on its word history. This sentence probability calculation, making use of conditional probability distributions, assumes that there is little impact from approximations used in the LMs including: the word history representations; and approaches to handle finite training data. This motivates examining models that make use of additional information from the sentence. In this work future word information, in addition to the history, is used to predict the probability of the current word. For recurrent neural network LMs (RNNLMs) this information can be encapsulated in a bi-directional model. However, if used directly this form of model is computationally expensive when training on large quantities of data, and can be problematic when used with word lattices. This paper proposes a novel neural network language model structure, the succeeding-word RNNLM, su-RNNLM, to address these issues. Instead of using a recurrent unit to capture the complete future word contexts, a feed-forward unit is used to model a fixed finite number of succeeding words. This is more efficient in training than bi-directional models and can be applied to lattice rescoring. The generated lattices can be used for downstream applications, such as confusion network decoding and keyword search. Experimental results on speech recognition and keyword spotting tasks illustrate the empirical usefulness of future word information, and the flexibility of the proposed model to represent this information.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1ece13c09e1e6fbf1664ab3756ef2ed8 Zobrazit plný text záznamu