Visual-guided audio source separation: an empirical study

Autor: Phi-Le Nguyen, Quoc Cuong Nguyen, Manh Nguyen Huu, Thanh Thi Hien Duong, Thi-Lan Le, Hai Nghiem Thi
Rok vydání: 2021
Předmět:
Zdroj: MAPR
DOI: 10.1109/mapr53640.2021.9585244
Popis: Real-world video scenes are usually very complicated as they are mixtures of many different audio-visual objects. Humans with normal hearing ability can easily locate, identify and differentiate sound sources which are heard simultaneously. However, this is an extremely difficult task for machines as the creation of machine listening algorithms that can automatically separate sound sources in difficult mixing conditions has remained very challenging. In this paper, we consider the use of a visual-guided audio source separation approach for separating sounds of different instruments in the video, where detected visual objects are used to assist the sound separation process. We particularly investigate the use of different object detectors for the task. In addition, as an empirical study, we analyze the effect of training datasets on separation performance. Finally, experiment results obtained from a benchmark dataset MUSIC confirm the advantages of the new object detector investigated in the paper.
Databáze: OpenAIRE