Independent language modeling architecture for end-to-end ASR
Autor: | Eng Siong Chng, Haizhou Li, Zhiping Zeng, Van Tung Pham, Bin Ma, Yerbolat Khassanov, Haihua Xu, Chongjia Ni |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Computation and Language Computer science Speech recognition Word error rate 020206 networking & telecommunications 02 engineering and technology 010501 environmental sciences 01 natural sciences Reduction (complexity) End-to-end principle Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering 0202 electrical engineering electronic engineering information engineering Language model Architecture Joint (audio engineering) Computation and Language (cs.CL) Encoder 0105 earth and related environmental sciences Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | ICASSP |
Popis: | The attention-based end-to-end (E2E) automatic speech recognition (ASR) architecture allows for joint optimization of acoustic and language models within a single network. However, in a vanilla E2E ASR architecture, the decoder sub-network (subnet), which incorporates the role of the language model (LM), is conditioned on the encoder output. This means that the acoustic encoder and the language model are entangled that doesn’t allow language model to be trained separately from external text data. To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output. In this way, the decoupled subnet becomes an independently trainable LM subnet, which can easily be updated using the external text data. We study two strategies for updating the new architecture. Experimental results show that, 1) the independent LM architecture benefits from external text data, achieving 9.3% and 22.8% relative character and word error rate reduction on Mandarin HKUST and English NSC datasets respectively; 2) the proposed architecture works well with external LM and can be generalized to different amount of labelled data. |
Databáze: | OpenAIRE |
Externí odkaz: |