Universal Human-Machine Speech Interface

Autor: Rosenfeld, Ronald, Olsen, Dan, Rudnicky, Alexander
Rok vydání: 2022
Předmět:
DOI: 10.1184/r1/21708521.v1
Popis: We call for investigation and evaluation of universal paradigms for human-machine speech communication. The vision driving us is ubiquitous human-machine interactivity via speech, and increased accessibility to technology for larger segments of the population. Speech recognition technology has made spoken interaction with machines feasible; simple applications have enjoyed commercial success. However, no suitable universal interaction paradigm has yet been proposed for humans to effectively, efficiently and effortlessly communicate by voice with machines. On one hand, systems based on natural language interaction have been successfully demonstrated in very narrow domains. But such systems require a lengthy development phase which is data and labor intensive, and heavy involvement by experts who meticulously craft the vocabulary, grammar and semantics for the specific domain. The need for such specialized knowledge engineering continues to hamper the adoption of natural language interfaces. Perhaps more importantly, unconstrained natural language severely strains recognition technology, and fails to delineate the functional limitations of the machine. On the other hand, telephone-based IVR systems use carefully crafted hierarchical menus navigated by DTMF tones or short spoken phrases. These systems are commercially viable for some applications, but are typically loathed due to their inefficiency, rigidity, incompleteness and high cognitive demand. These shortcomings prevent them from being deployed more widely. These two interaction styles are extremes along a continuum. Natural language is the most effortless and flexible communication method for humans. For machines, however, it is challenging in limited domains and altogether infeasible otherwise. Menu systems are easy for computers and assure the best speech recognition performance due to their low branch-out factor. However, they are too cumbersome, rigid and inefficient to be widely accepted by humans. The optimal style, or paradigm, for human-machine communication arguably lies somewhere in between: more regular than natural language, yet more flexible than simple hierarchical menus. The key problem is to understand the desired properties of such a style. We have analyzed human communication with a variety of machines, appliances, information servers and database managers, and plan to propose and evaluate a universal interface style. Such a style consists of a metaphor (similar to the desktop metaphor in graphical interfaces), a set of universal interaction primitives (help request, navigation, confirmation, correction etc.), and a graphical component for applications afforded a display. Extensive user studies will be conducted to evaluate the habitability of the proposed interface and the transference of user skills across applications. In addition, a toolkit will be created to facilitate rapid development of compliant applications, and its usefulness will be empirically assessed.
Databáze: OpenAIRE