Quality measures for speaker verification with short utterances

Autor:	Sahidullah, Goutam Saha, Arnab Poddar
Přispěvatelé:	Indian Institute of Technology Kharagpur (IIT Kharagpur), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Rok vydání:	2019
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Computer science Computer Vision and Pattern Recognition (cs.CV) media_common.quotation_subject Speech recognition Posterior probability Computer Science - Computer Vision and Pattern Recognition Word error rate Total Variability 02 engineering and technology Machine Learning (cs.LG) [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] Reduction (complexity) [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] Gaussian Mixture Model (GMM) Artificial Intelligence 0202 electrical engineering electronic engineering information engineering System Fusion Quality (business) Electrical and Electronic Engineering Quality Measure media_common Applied Mathematics Voice Authentication [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM] Process (computing) 020206 networking & telecommunications Speaker Verification Speaker recognition Identity Vector (i-vector) Computational Theory and Mathematics Signal Processing NIST Duration Variability 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Posterior Probability Statistics Probability and Uncertainty Short Utterances Universal Background Model (UBM) [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing Sufficient statistic
Zdroj:	Digital Signal Processing Digital Signal Processing, 2019, 88, pp.66-79. ⟨10.1016/j.dsp.2019.01.023⟩ Digital Signal Processing, Elsevier, 2019, 88, pp.66-79. ⟨10.1016/j.dsp.2019.01.023⟩
ISSN:	1051-2004 1095-4333
DOI:	10.1016/j.dsp.2019.01.023
Popis:	The performances of the automatic speaker verification (ASV) systems degrade due to the reduction in the amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers considerably reduces speaker verification error rate with short utterances. This work attempts to incorporate supplementary information during the system combination process. We use quality of the estimated model parameters as supplementary information. We introduce a class of novel quality measures formulated using the zero-order sufficient statistics used during the i-vector extraction process. We have used the proposed quality measures as side information for combining ASV systems based on Gaussian mixture model-universal background model (GMM-UBM) and i-vector. The proposed methods demonstrate considerable improvement in speaker recognition performance on NIST SRE corpora, especially in short duration conditions. We have also observed improvement over existing systems based on different duration-based quality measures. Comment: Accepted for publication in Digital Signal Processing: A Review Journal
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bc70fa86e878d5f3b162a61bf9046d20 https://doi.org/10.1016/j.dsp.2019.01.023 Zobrazit plný text záznamu Full Text from ScienceDirect