Audio–Visual Sound Source Localization and Tracking Based on Mobile Robot for The Cocktail Party Problem

Autor: Zhanbo Shi, Lin Zhang, Dongqing Wang
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Applied Sciences, Vol 13, Iss 10, p 6056 (2023)
Druh dokumentu: article
ISSN: 2076-3417
DOI: 10.3390/app13106056
Popis: Locating the sound source is one of the most important capabilities of robot audition. In recent years, single-source localization techniques have increasingly matured. However, localizing and tracking specific sound sources in multi-source scenarios, which is known as the cocktail party problem, is still unresolved. In order to address this challenge, in this paper, we propose a system for dynamically localizing and tracking sound sources based on audio–visual information that can be deployed on a mobile robot. Our system first locates specific targets using pre-registered voiceprint and face features. Subsequently, the robot moves to track the target while keeping away from other sound sources in the surroundings instructed by the motion module, which helps the robot gather clearer audio data of the target to perform downstream tasks better. Its effectiveness has been verified via extensive real-world experiments with a 20% improvement in the success rate of specific speaker localization and a 14% reduction in word error rate in speech recognition compared to its counterparts.
Databáze: Directory of Open Access Journals