SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR

Autor:	Zhao, Qiuming, Sun, Guangzhi, Zhang, Chao, Xu, Mingxing, Zheng, Thomas Fang
Rok vydání:	2024
Předmět:	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker adaptive mixture of LoRA experts (SAML) approach, which uses low-rank adaptation (LoRA) modules as experts to reduce the number of trainable parameters in MoE. Specifically, SAML is applied to the quantised and personalised end-to-end automatic speech recognition models, which combines test-time speaker adaptation to improve the performance of heavily compressed models in speaker-specific scenarios. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size, 29.1% and 31.1% relative word error rate reductions were achieved on the quantised Whisper model and Conformer-based attention-based encoder-decoder ASR model respectively, comparing to the original full precision models. Comment: 5 pages, accepted by Interspeech 2024. arXiv admin note: substantial text overlap with arXiv:2309.09136
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2406.19706 Zobrazit plný text záznamu View this record from Arxiv