A Thresholded Discriminative Metric Learning Approach for Deep Speaker Recognition
Autor: | Yeh, Yin-Cheng, 葉胤呈 |
---|---|
Rok vydání: | 2018 |
Druh dokumentu: | 學位論文 ; thesis |
Popis: | 106 Speaker recognition has been widely used in many biometric security applications for decades. With the deep learning thriving today, deep models has out-performed the traditional probability-based models in many speaker recognition applications. However, compared with the studio-quality audio samples, the performance of deep models still fluctuate dramatically when background noises involved in the real-world scenario. In this thesis, we aim to build a robust speaker identification, verification, and clustering system and solve the degradation brought by background noise. To be more specific, the deep model will be refined from two perspectives, the data pre-processing and the model training stage. In the data preparation stage, noise datasets and environment filters are used to augment the data to help the model adapting the noise environment and prevent the model from over-fitting. In the model training stage, classification would be used as the initial model for further embedding training. Next, we applied our proposed embedding optimization approach, threshold center loss, to further discriminate speakers to achieve noise-resisted model on the speaker verification and clustering tasks. To sum up, this model is capable to achieve 6.48\% equal error rate and the accuracy of the speaker clustering more then 90\% if the number of speakers less than 20 in VoxCeleb Dataset. |
Databáze: | Networked Digital Library of Theses & Dissertations |
Externí odkaz: |