Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

Autor:	Radu Horaud, Laurent Girin, Xiaofei Li, Fabien Badeig
Přispěvatelé:	Interpretation and Modelling of Images and Videos (PERCEPTION ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing (GIPSA-CRISSP), Département Parole et Cognition (GIPSA-DPC), Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Grenoble Images Parole Signal Automatique (GIPSA-lab ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut Polytechnique de Grenoble - Grenoble Institute of Technology-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), IEEE, European Project: 609465,EC:FP7:ICT,FP7-ICT-2013-10,EARS(2014), European Project: 340113,EC:FP7:ERC,ERC-2013-ADG,VHIA(2014)
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences Microphone array Sound (cs.SD) Computer science Feature vector Acoustics 02 engineering and technology Transfer function Computer Science - Sound 030507 speech-language pathology & audiology 03 medical and health sciences symbols.namesake Computer Science - Robotics [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing Audio and Speech Processing (eess.AS) 0202 electrical engineering electronic engineering information engineering FOS: Electrical engineering electronic engineering information engineering [INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO] Impulse response Short-time Fourier transform Spectral density 020206 networking & telecommunications Noise Fourier transform binaural hearing Computer Science::Sound [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] symbols sound-source localization 0305 other medical science Robotics (cs.RO) Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	IEEE/RSJ International Conference on Intelligent Robots and Systems IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Oct 2016, Daejeon, South Korea. pp.2819-2826, ⟨10.1109/IROS.2016.7759437⟩
DOI:	10.1109/IROS.2016.7759437⟩
Popis:	This paper addresses the problem of sound-source localization (SSL) with a robot head, which remains a challenge in real-world environments. In particular we are interested in locating speech sources, as they are of high interest for human-robot interaction. The microphone-pair response corresponding to the direct-path sound propagation is a function of the source direction. In practice, this response is contaminated by noise and reverberations. The direct-path relative transfer function (DP-RTF) is defined as the ratio between the direct-path acoustic transfer function (ATF) of the two microphones, and it is an important feature for SSL. We propose a method to estimate the DP-RTF from noisy and reverberant signals in the short-time Fourier transform (STFT) domain. First, the convolutive transfer function (CTF) approximation is adopted to accurately represent the impulse response of the microphone array, and the first coefficient of the CTF is mainly composed of the direct-path ATF. At each frequency, the frame-wise speech auto- and cross-power spectral density (PSD) are obtained by spectral subtraction. Then a set of linear equations is constructed by the speech auto- and cross-PSD of multiple frames, in which the DP-RTF is an unknown variable, and is estimated by solving the equations. Finally, the estimated DP-RTFs are concatenated across frequencies and used as a feature vector for SSL. Experiments with a robot, placed in various reverberant environments, show that the proposed method outperforms two state-of-the-art methods. IEEE/RSJ International Conference on Intelligent Robots and Systems
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6db8df7904d60c014df8dc1de0ef7834 http://arxiv.org/abs/2012.03574 Zobrazit plný text záznamu