Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

Autor: Hemant A. Patil, Nirmalya Sen, Krothapalli Sreenivasa Rao, T. K. Basu, Shyamal Kumar Das Mandal, Md. Sahidullah
Přispěvatelé: R. H. Sapat College of Engineering Management Studies & Research, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Indian Institute of Technology Kharagpur (IIT Kharagpur), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Rok vydání: 2021
Předmět:
Signal Processing (eess.SP)
FOS: Computer and information sciences
Computer Science - Machine Learning
Linguistics and Language
Boosting (machine learning)
Computer science
Speech recognition
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
02 engineering and technology
Speaker Recognition
Language and Linguistics
Machine Learning (cs.LG)
030507 speech-language pathology & audiology
03 medical and health sciences
GMM-UMB Classifier
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
Classifier (linguistics)
FOS: Electrical engineering
electronic engineering
information engineering

0202 electrical engineering
electronic engineering
information engineering

Electrical Engineering and Systems Science - Signal Processing
Duration (project management)
Short Test Utterance
Perspective (graphical)
020206 networking & telecommunications
Speaker recognition
Mixture model
GMM-SVM Classifier
Human-Computer Interaction
Support vector machine
ComputingMethodologies_PATTERNRECOGNITION
Utterance Partitioning
Duration Variability
Computer Vision and Pattern Recognition
0305 other medical science
Software
Utterance
Zdroj: International Journal of Speech Technology
International Journal of Speech Technology, Springer Verlag, In press, ⟨10.1007/s10772-021-09862-8⟩
International Journal of Speech Technology, 2021, 24, pp.1067-1088. ⟨10.1007/s10772-021-09862-8⟩
ISSN: 1572-8110
1381-2416
DOI: 10.1007/s10772-021-09862-8
Popis: The performance of speaker recognition system is highly dependent on the amount of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in presence of duration variability. This article also reports a comparison of the performance of GMM-SVM classifier with its precursor technique Gaussian mixture model-universal background model (GMM-UBM) classifier in presence of duration variability. The goal of this research work is not to propose a new algorithm for improving speaker recognition performance in presence of duration variability. However, the main focus of this work is on utterance partitioning (UP), a commonly used strategy to compensate the duration variability issue. We have analysed in detailed the impact of training utterance partitioning in speaker recognition performance under GMM-SVM framework. We further investigate the reason why the utterance partitioning is important for boosting speaker recognition performance. We have also shown in which case the utterance partitioning could be useful and where not. Our study has revealed that utterance partitioning does not reduce the data imbalance problem of the GMM-SVM classifier as claimed in earlier study. Apart from these, we also discuss issues related to the impact of parameters such as number of Gaussians, supervector length, amount of splitting required for obtaining better performance in short and long duration test conditions from speech duration perspective. We have performed the experiments with telephone speech from POLYCOST corpus consisting of 130 speakers.
International Journal of Speech Technology, Springer Verlag, In press
Databáze: OpenAIRE