One-Shot Speaker Identification for a Service Robot using a CNN-based Generic Verifier

Autor:	Vélez, Ivette, Rascon, Caleb, Fuentes-Pineda, Gibrán
Rok vydání:	2018
Předmět:	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound
Druh dokumentu:	Working Paper
Popis:	In service robotics, there is an interest to identify the user by voice alone. However, in application scenarios where a service robot acts as a waiter or a store clerk, new users are expected to enter the environment frequently. Typically, speaker identification models need to be retrained when this occurs, which can take an impractical amount of time. In this paper, a new approach for speaker identification through verification has been developed using a Siamese Convolutional Neural Network architecture (SCNN), where it learns to generically verify if two audio signals are from the same speaker. By having an external database of recorded audio of the users, identification is carried out by verifying the speech input with each of its entries. If new users are encountered, it is only required to add their recorded audio to the external database to be able to be identified, without retraining. The system was evaluated in four different aspects: the performance of the verifier, the performance of the system as a classifier using clean audio, its speed, and its accuracy in real-life settings. Its performance in conjunction with its one-shot-learning capabilities, makes the proposed system a viable alternative for speaker identification for service robots. Comment: 8 pages, 9 figures, 2 tables. This paper is under review as a Submission for RA-L and ICRA for the IEEE Robotics and Automation Letters (RA-L). A video demonstration of the full system, as well as all relevant downloads (corpora, source code, models, etc.) can be found at: http://calebrascon.info/oneshotid/
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/1809.04115 Zobrazit plný text záznamu View this record from Arxiv