General Sequence Teacher–Student Learning

Autor: Jeremy H. M. Wong, Yu Wang, Mark J. F. Gales
Rok vydání: 2019
Předmět:
Zdroj: IEEE/ACM Transactions on Audio, Speech, and Language Processing. 27:1725-1736
ISSN: 2329-9304
2329-9290
DOI: 10.1109/taslp.2019.2929859
Popis: In automatic speech recognition, performance gains can often be obtained by combining an ensemble of multiple models. However, this can be computationally expensive when performing recognition. Teacher–student learning alleviates this cost by training a single student model to emulate the combined ensemble behaviour. Only this student needs to be used for recognition. Previously investigated teacher–student criteria often limit the forms of diversity allowed in the ensemble, and only propagate information from the teachers to the student at the frame level. This paper addresses both of these issues by examining teacher–student learning within a sequence-level framework, and assessing the flexibility that these approaches offer. Various sequence-level teacher–student criteria are examined in this work, to propagate sequence posterior information. A training criterion based on the Kullback–Leibler KL-divergence between context-dependent state sequence posteriors is proposed that allows for a diversity of state cluster sets to be present in the ensemble. This criterion is shown to be an upper bound to a more general KL-divergence between word sequence posteriors, which places even fewer restrictions on the ensemble diversity, but whose gradient can be expensive to compute. These methods are evaluated on the augmented multi-party interaction AMI meeting transcription and MGB-3 television broadcast audio tasks.
Databáze: OpenAIRE