Eliminating Data Collection Bottleneck for Wake Word Engine Training Using Found and Synthetic Data

Autor: Thomas Woo, Anil Rana, Lawrence Drabeck, Troy Cauble, Buvaneswari Ramanan
Rok vydání: 2019
Předmět:
Zdroj: IEEE BigData
DOI: 10.1109/bigdata47090.2019.9006601
Popis: Voice interfaces are fast becoming an important human-machine interfaces, and Wake Word Engines (WWEs) are a critical part of modern voice interfaces. There are recent advancements in applying Deep Learning (DL) or Deep Neural Network architectures for WWE construction. Similar to other applications of DL however, achieving good accuracy strongly depends on training using the right type of dataset – a task that traditionally requires significant time and human effort. In this paper, we present novel techniques for curating WWE datasets that significantly minimizes the need for data collection from humans. More specifically, we investigated two techniques: (1) Using “Found Data”: we have created automated data curation pipelines for locating and processing data from public sources (e.g., YouTube), and (2) Using Synthetic Data: we explored the use of Synthesized Speech (using Text to Speech and Voice Conversion) to synthetically generate WWE datasets. Using our techniques, we are able to train WWEs that demonstrate a level of performance that is on-par with the WWEs trained with data from expensive human collection (e.g., Mechanical Turk). For example, in our experiments using wake words ‘Computer’ and ‘Shannon’, for a given False Alarm rate of 1 per hour, WWEs trained with our novel methods exhibit a False Reject Rate, averaged across four different test environments, of 0.9% and 1.6% respectively compared to the 1.0% and 0.8% for the baseline trained using data from human collection. Cycle time savings of more than an order of magnitude is possible by utilizing these methods.
Databáze: OpenAIRE