Popis: |
Due to progress in technology of speech recognition and understanding, the Spoken dialogue systems (SDS) have started to emerge as a practical alternative for a conversational computer interface. They are more effective than Interactive Voice Response (IVR) systems since they allow a more free and natural interaction. The Spoken dialogue systems are designed for providing automatic dialogue-based voice services accessible through telephone. Such systems consist of a number of components that need to work together for the system to function successfully (McTear, 2005). The basic architecture of the SDS consists of (more or less indispensable) modules – dialogue manager, language understanding, speech recognition, access device interface, language generation and text to speech synthesis. Easiness of implementation and rapid development of voice services led to the standardization effort. The World Wide Web Consortium plays an important role in this area. The book chapter proposed will be focused on design, development and evaluation of the Spoken dialogue systems based on the W3C Recommendations. The World Wide Web Consortium is an international community that develops standards to ensure the long-term growth of the Web (W3C, 2010). One of their workgroups, Voice Browser Working Group, deals with preparing standards for voice-enabled technologies. The main idea is to build “Voice browser” enabling access to the information by voice, similarly as in the case of web browser. Comparison of definitions of the Spoken dialogue system and Voice browser lead to the conclusion, that both systems are very similar or de facto identical. A group of XMLbased languages (SIF Speech Interface Framework) was defined by Voice Browser Working Group to enable speech communication between user and computer. The W3C SIF recommendations became the industry standards in voice-enabled technology domain during the last decade. Languages in SIF also define the interfaces between fundamental subsystems of Spoken dialogue system and thus determine the basic structure of such system. The main languages in the framework are VoiceXML, SRGS and SSML that enable composing dialogues, speech grammars and instructions for text-to-speech systems. The CCXML serves for handling I/O (telephony) devices. The SISR specification defines the semantic tags for speech grammars to enable extracting of the meaning of user’s input. The meaning can be represented in the EMMA language, which was prepared by the W3C |