Study on Acoustic Model Personalization in a Context of Collaborative Learning Constrained by Privacy Preservation
Autor: | Marc Tommasi, Salima Mdhaffar, Yannick Estève |
---|---|
Přispěvatelé: | Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Machine Learning in Information Networks (MAGNET), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), ANR-18-CE23-0018,DEEP-PRIVACY,Apprentissage distribué, personnalisé, préservant la privacité pour le traitement de la parole(2018) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Personalization
Computer science Privacy protection Automatic speech recognition Acoustic model 020206 networking & telecommunications Context (language use) Collaborative learning 02 engineering and technology Federated learning [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Human–computer interaction Similarity (psychology) 0202 electrical engineering electronic engineering information engineering Privacy-protection 020201 artificial intelligence & image processing Acoustic models |
Zdroj: | Speech and Computer 23rd International Conference, SPECOM 2021, St. Petersburg, Russia, September 27–30, 2021, Proceedings SPECOM 2021-23rd International Conference on Speech and Computer SPECOM 2021-23rd International Conference on Speech and Computer, Sep 2021, St Petersburg, Russia. pp.426-436, ⟨10.1007/978-3-030-87802-3_39⟩ Speech and Computer ISBN: 9783030878016 SPECOM |
Popis: | International audience; This paper investigates different approaches in order to improve the performance of a speech recognition system for a given speaker by using no more than 5 min of speech from this speaker, and without exchanging data from other users/speakers. Inspired by the federated learning paradigm, we consider speakers that have access to a personalized database of their own speech, learn an acoustic model and collaborate with other speakers in a network to improve their model. Several local personalizations are explored depending on how aggregation mechanisms are performed. We study the impact of selecting, in an adaptive way, a subset of speakers's models based on a notion of similarity. We also investigate the effect of weighted averaging of fine-tuned and global models. In our approach, only neural acoustic model parameters are exchanged and no audio data is exchanged. By avoiding communicating their personal data, the proposed approach tends to preserve the privacy of speakers. Experiments conducted on the TEDLIUM 3 dataset show that the best improvement is given by averaging a subset of different acoustic models fine-tuned on several user datasets. Our approach applied to HMM/TDNN acoustic models improves quickly and significantly the ASR performance in terms of WER (for instance in one of our two evaluation datasets, from 14.84% to 13.45% with less than 5 min of speech per speaker). |
Databáze: | OpenAIRE |
Externí odkaz: |