HiLAM-state discriminative multi-task deep neural network in dynamic time warping framework for text-dependent speaker verification

Autor: Mohammad Azharuddin Laskar, Rabul Hussain Laskar
Rok vydání: 2020
Předmět:
Zdroj: Speech Communication. 121:29-43
ISSN: 0167-6393
DOI: 10.1016/j.specom.2020.03.007
Popis: This paper builds on a multi-task Deep Neural Network (DNN), which provides an utterance-level feature representation called j-vector, to implement a Text-dependent Speaker Verification (TDSV) system. This technique exploits the speaker idiosyncrasies associated with individual pass-phrases. However, speaker information is known to be characteristic of more specific speech units and, thus, it is likely that important speaker identity traits might get averaged out if it is considered as a coarse entity spread uniformly across the whole pass-phrase. This work attempts to overcome this limitation and devises a technique to leverage the finer speaker traits. It proposes to align the training data for Multi-task DNN using Hierarchical Multi-Layer Acoustic Model (HiLAM). HiLAM is an HMM-based text-dependent model that defines refined segments of a pass-phrase using Gaussian Mixture Model (GMM) states. This helps to exploit the speaker idiosyncrasies associated with finer and more specific segments of speech. Also, as HiLAM is built using the particular text in question, this alignment technique automatically takes care of the exact context of the speech units in the concerned pass-phrase. The proposed technique has been found to improve the performance of the system significantly. Integrating Dynamic Time Warping (DTW) with this technique leads to further improvement in the performance of the system. Experiments have been validated on Part 1 of RSR2015, RedDots, and NITS-TD databases. The best-performing proposed system achieves a relative Equal Error Rate (EER) reduction of up to 50.98% with respect to the baseline j-vector-based system for the overall test condition in case of RSR2015 database.
Databáze: OpenAIRE