SIGTYP 2021 Shared Task: Robust Spoken Language Identification

Autor:	Salesky, Elizabeth, Abdullah, Badr M., Mielke, Sabrina J., Klyachko, Elena, Serikov, Oleg, Ponti, Edoardo, Kumar, Ritesh, Cotterell, Ryan, Vylomova, Ekaterina
Rok vydání:	2021
Předmět:	Computer Science - Computation and Language Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	While language identification is a fundamental speech and language processing task, for many languages and language families it remains a challenging task. For many low-resource and endangered languages this is in part due to resource availability: where larger datasets exist, they may be single-speaker or have different domains than desired application scenarios, demanding a need for domain and speaker-invariant language identification systems. This year's shared task on robust spoken language identification sought to investigate just this scenario: systems were to be trained on largely single-speaker speech from one domain, but evaluated on data in other domains recorded from speakers under different recording circumstances, mimicking realistic low-resource scenarios. We see that domain and speaker mismatch proves very challenging for current methods which can perform above 95% accuracy in-domain, which domain adaptation can address to some degree, but that these conditions merit further investigation to make spoken language identification accessible in many scenarios. Comment: The first three authors contributed equally
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2106.03895 Zobrazit plný text záznamu View this record from Arxiv