Autor: |
Wang, Quan, Yu, Yang, Pelecanos, Jason, Huang, Yiling, Moreno, Ignacio Lopez |
Rok vydání: |
2022 |
Předmět: |
|
Druh dokumentu: |
Working Paper |
Popis: |
In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference can be performed in a streaming fashion. Additionally, we investigate two domain adaptation approaches to allow adapting an existing language identification model without retraining the model parameters for a new domain. We perform a comparative study of different model topologies under different constraints of model size, and find that conformer-based models significantly outperform LSTM and transformer based models. Our experiments also show that attentive temporal pooling and domain adaptation improve model accuracy. |
Databáze: |
arXiv |
Externí odkaz: |
|