Improvements in switchboard recognition and topic identification
Autor: | Don McAllaster, S. Lowe, V. Nagesha, S. Connolly, L. Gillick, Barbara Peskin |
---|---|
Rok vydání: | 2002 |
Předmět: |
Vocabulary
Computer science business.industry Speech recognition media_common.quotation_subject Word error rate computer.software_genre Identification (information) Transcription (linguistics) Language technology Language model Artificial intelligence business computer Natural language processing Word (computer architecture) Natural language media_common |
Zdroj: | ICASSP |
DOI: | 10.1109/icassp.1996.540418 |
Popis: | We revisit a topic identification test on the Switchboard Corpus first reported by Gillick et al. (see Proc. ICASSP-93, 1993 and ARPA Workshop on Human Language Technology, 1993). This approach to topic ID uses a large vocabulary continuous speech recognizer as a front-end to transcribe the speech and then scores the transcripts using a set of topic-specific language models. Our recognition of conversational telephone speech has improved dramatically in the three years since the original test, dropping from word error rates in the 90%'s to those in the 40%'s. Changing only the recognition engine but otherwise leaving our 1993 topic ID system in place, the resulting rate of message misclassification drops from 33/120 in 1993 down to 1/120 now-the same error rate that we obtain from the true transcriptions. This paper describes the topic classification test and the many improvements to the recognition engine that made such a dramatic reduction possible. |
Databáze: | OpenAIRE |
Externí odkaz: |