Label Aware Speech Representation Learning For Language Identification

Autor:	Vashishth, Shikhar, Bharadwaj, Shikhar, Ganapathy, Sriram, Bapna, Ankur, Ma, Min, Han, Wei, Axelrod, Vera, Talukdar, Partha
Rok vydání:	2023
Předmět:	Computer Science - Computation and Language Computer Science - Machine Learning Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	Speech representation learning approaches for non-semantic tasks such as language recognition have either explored supervised embedding extraction methods using a classifier model or self-supervised representation learning approaches using raw data. In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task. This framework, termed as Label Aware Speech Representation (LASR) learning, uses a triplet based objective function to incorporate language labels along with the self-supervised loss function. The speech representations are further fine-tuned for the downstream task. The language recognition experiments are performed on two public datasets - FLEURS and Dhwani. In these experiments, we illustrate that the proposed LASR framework improves over the state-of-the-art systems on language identification. We also report an analysis of the robustness of LASR approach to noisy/missing labels as well as its application to multi-lingual speech recognition tasks. Comment: Accepted at Interspeech 2023
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2306.04374 Zobrazit plný text záznamu View this record from Arxiv