A Study and Implementation on Speaker Verification Based on Deep Learning

Autor: Zhengmin Xu, 許正忞
Rok vydání: 2018
Druh dokumentu: 學位論文 ; thesis
Popis: 107
This dissertation has studied and implemented several text-independent speaker verification systems based on deep learning. In this thesis, acoustic features after speech front-end processing (such as MFCC) are used as the input, and neural networks are trained with the aim of speaker identification or speaker clustering. After training, part of the neural network is used as a feature extractor to extract the speaker feature from a given utterance. For each enrollment speaker, we use the trained neural network to extract speaker features from each of his/her utterance, and use the averaged feature vector as his/her speaker model. In the verification phase, we use the same neural network to extract speaker feature from the given test utterance, and then calculate the cosine similarity between it and the speaker model which to be verified. If the similarity exceeds the predefined threshold, the test speaker is accepted by the system, otherwise he/she is rejected. In this dissertation, we tried various designs on the neural network architecture and conducted experiments in the 8conv part of NIST SRE2010 corpus. The experimental results show that the performance of the system presented in this paper shows a clear advantage over the i-vector system when the test utterance is short. Specifically, when enrolled with full-length utterances and verifying with utterances of only 2 seconds, the best system EER in this paper is only 9.75%, which is almost half of the i-vector system. In terms of speaker identification, the best system accuracy in this paper reaches more than 85%.
Databáze: Networked Digital Library of Theses & Dissertations