Abstrakt: |
The problem of recognizing strings of connected digits is crucial to a number of applications such as voice dialing of telephone numbers, automatic data entry, credit card entry, PIN (personal identification number) entry, entry of access codes for transactions, etc. Algorithms for connected digit recognition, based on whole-word reference patterns, have become increasingly sophisticated and have been shown capable of achieving high-recognition performance. Much of this complexity is derived from the design of specialized word models suitable solely for connected digit recognition. For example, in Doddington (1989), context-dependent modeling for the digits two and four and confusion class models for likely digit confusions were used to give high-performance digit recognition. Historically, the training and modeling techniques developed for connected digit recognition have been improved and successfully incorporated into large-vocabulary recognition systems. Based largely on these techniques, there have been proposed and implemented a number of systems for large vocabulary speech recognition which have achieved high word recognition accuracy. In this paper we reverse the direction of technology flow; namely, we show how we can apply the improved acoustic modeling techniques (using a continuous density hidden Markov model framework), developed for large-vocabulary speech recognition applications, to the problem of connected digit recognition with no changes made to the basic modeling techniques and with no vocabulary-specific information used. The improved modeling techniques adopted in this study include an improved feature analysis procedure, that incorporates higher-order cepstral and log energy time derivatives, and an improved acoustic resolution procedure, that uses more Gaussian mixture components per state to characterize the acoustic variability in each state of the model. Using these techniques, string accuracies of 98·6% for unknown length strings and 99·2% for known length strings were achieved on the standard Texas Instruments connected digits database. These string accuracies are a factor of 2 better than those previously reported using the same modeling procedures (Rabiner, et al., 1989b), and are even somewhat better than those reported by Doddington using specialized modeling techniques for the digits (Doddington, 1989). Copyright 1993, 1999 Academic Press |