Improvements in switchboard recognition and topic identification

Autor: Don McAllaster, S. Lowe, V. Nagesha, S. Connolly, L. Gillick, Barbara Peskin
Rok vydání: 2002
Předmět:
Zdroj: ICASSP
DOI: 10.1109/icassp.1996.540418
Popis: We revisit a topic identification test on the Switchboard Corpus first reported by Gillick et al. (see Proc. ICASSP-93, 1993 and ARPA Workshop on Human Language Technology, 1993). This approach to topic ID uses a large vocabulary continuous speech recognizer as a front-end to transcribe the speech and then scores the transcripts using a set of topic-specific language models. Our recognition of conversational telephone speech has improved dramatically in the three years since the original test, dropping from word error rates in the 90%'s to those in the 40%'s. Changing only the recognition engine but otherwise leaving our 1993 topic ID system in place, the resulting rate of message misclassification drops from 33/120 in 1993 down to 1/120 now-the same error rate that we obtain from the true transcriptions. This paper describes the topic classification test and the many improvements to the recognition engine that made such a dramatic reduction possible.
Databáze: OpenAIRE