Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
Autor: | Dan Jurafsky, Noah Coccaro, Elizabeth Shriberg, Paul Taylor, Rebecca Bates, Andreas Stolcke, Klaus Ries, Rachel Martin, Marie Meteer, Carol Van Ess-Dykema |
---|---|
Rok vydání: | 2000 |
Předmět: |
FOS: Computer and information sciences
Linguistics and Language Computer science Speech recognition media_common.quotation_subject 02 engineering and technology computer.software_genre Language and Linguistics Dialog act Artificial Intelligence 020204 information systems 0202 electrical engineering electronic engineering information engineering Conversation Prosody Hidden Markov model media_common Backchannel Computer Science - Computation and Language Grammar business.industry I.2.7 16. Peace & justice Computer Science Applications Word recognition 020201 artificial intelligence & image processing Artificial intelligence business computer Computation and Language (cs.CL) Natural language processing Coherence (linguistics) |
DOI: | 10.48550/arxiv.cs/0006023 |
Popis: | We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error. Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed) |
Databáze: | OpenAIRE |
Externí odkaz: |