Vocal tract length normalization using rapid maximum-likelihood estimation for speech recognition.

Autor: Emori, Tadashi, Shinoda, Koichi
Předmět:
Zdroj: Systems & Computers in Japan; 5/1/2002, Vol. 33 Issue 5, p30-40, 11p
Abstrakt: Speaker normalization techniques for correcting differences in the vocal tract lengths of different speakers, referred to as vocal tract length normalization, in a large vocabulary voice recognition system using a hidden Markov model (HMM), have been proposed in recent years. In this paper, a scheme for approximating especially small changes in the vocal tract length by linear mapping using a vocal tract length parameter in cepstrum space and maximum-likelihood estimation of this parameter from vocalization is proposed. The proposed method can estimate a more optimal parameter for a speaker with a small amount of computation than in past schemes using multiple vocal tract length parameters in advance. In evaluation tests of the recognition of 5000 single Japanese words, the proposed scheme decreased errors by 7.1% alone and 14.6% in combination with cepstrum mean normalization (CMN). © 2002 Wiley Periodicals, Inc. Syst Comp Jpn, 33(5): 30–40, 2002; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.1125 [ABSTRACT FROM AUTHOR]
Databáze: Supplemental Index