Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006

Autor: Frantisek Grezl, Martin Karafiat, Jan Cernocky, Ondrej Glembek, Petr Schwarz, Pavel Matejka, Niko Brümmer, D.A. van Leeuwen, Albert Strasheim, Lukas Burget
Přispěvatelé: TNO Defensie en Veiligheid
Rok vydání: 2007
Předmět:
Zdroj: IEEE Transactions on Audio, Speech, and Language Processing. 15:2072-2084
ISSN: 1558-7924
1558-7916
DOI: 10.1109/tasl.2007.902870
Popis: This paper describes and discusses the "STBU" speaker recognition system, which performed well in the NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), and the University of Stellenbosch (Stellenbosch, South Africa). The STBU system was a combination of three main kinds of subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features, 2) Gaussian mixture model-support vector machine (GMM-SVM), using GMM mean supervectors as input to an SVM, and 3) maximum-likelihood linear regression-support vector machine (MLLR-SVM), using MLLR speaker adaptation coefficients derived from an English large vocabulary continuous speech recognition (LVCSR) system. All subsystems made use of supervector subspace channel compensation methods-either eigenchannel adaptation or nuisance attribute projection. We document the design and performance of all subsystems, as well as their fusion and calibration via logistic regression. Finally, we also present a cross-site fusion that was done with several additional systems from other NIST SRE-2006 participants. © 2006 IEEE.
Databáze: OpenAIRE