Intelligent Multi-modal Interfaces for Mobile Applications in Hostile Environment(IM-HOST)

Autor:	Paul-Henri Rey, Joel Praveen Pinto, Jean-Frédéric Wagen, Claude Stricker, Hynek Hermansky, Guillermo Aradilla, Hervé Bourlard, Jérôme Théraulaz
Rok vydání:	2009
Předmět:	Dynamic time warping Engineering Voice activity detection Microphone business.industry Keyword spotting Speech recognition Noise (video) Voice command device business Host (network) Wireless microphone
Zdroj:	Lecture Notes in Computer Science ISBN: 9783642004360 Human Machine Interaction
Popis:	Multi-modal interfaces for mobile applications include tiny screens, keyboards, touch screens, ear phones, microphones and software components for voice-based man-machine interaction. The software enabling voice recognition, as well as the microphone, are of primary importance in a noisy environment. Current performances of voice applications are reasonably good in quiet environment. However, the surrounding noise in many practical situations largely deteriorates the quality of the speech signal. As a consequence, the recognition rate decreases significantly. Noise management is a major focus in developing voice-enabled technologies. This project addresses the problem of voice recognition with the goal of reaching a high success rate (ideally above 99%) in an outdoor environment that is noisy and hostile: the user stands on an open deck of a motor-boat and use his/her voice to command applications running on a laptop by using a wireless microphone. In addition to the problem of noise, there are other constraints strongly limiting the hardware options. Furthermore, the user must also perform several tasks simultaneously. The success of the solution must rely on the efficiency and effectiveness of the voice recognition algorithm and the choice of the microphone. In addition, the training of the recognizer should be kept to a minimum and the recognition time should not last longer than 3 seconds. For these two reasons, only a limited set of voice commands have been tested. A first demonstrator based on digit keyword spotting trained over phone speech showed poor performances in very noisy conditions. A second demonstrator combining neural network and template matching techniques lead to nearly acceptable results when the user recorded the keywords. Since the recognition rate was approximated around 90%, no additional field test was undertaken. This R&D project shows that state-of-the-art research on voice recognition needs further investigations in order to recognize spoken keywords in noisy environments. In addition to on-going improvements, unconventional research approaches that are worth testing include, deriving adapted keywords to specialized algorithms and having the user learn these keyword.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::e70dc109d0f7021644ae67caa4d88fec https://doi.org/10.1007/978-3-642-00437-7_4 Zobrazit plný text záznamu