Modeling of speech localization in a multi-talker mixture using periodicity and energy-based auditory features

Autor:	Volker Hohmann, Norbert Kopčo, Angela Josupeit
Rok vydání:	2016
Předmět:	Male Reverberation Periodicity Auditory scene analysis Auditory Pathways Time Factors Acoustics and Ultrasonics Computer science Acoustics Speech recognition Interaural time difference Intelligibility (communication) Models Psychological 01 natural sciences Signal Speech Acoustics 03 medical and health sciences 0302 clinical medicine Arts and Humanities (miscellaneous) 0103 physical sciences Humans Computer Simulation Sound Localization 010301 acoustics Template matching Spectral density Acoustic Stimulation Speech Perception Female Cues Noise Binaural recording Perceptual Masking 030217 neurology & neurosurgery
Zdroj:	The Journal of the Acoustical Society of America. 139(5)
ISSN:	1520-8524
Popis:	A recent study showed that human listeners are able to localize a short speech target simultaneously masked by four speech tokens in reverberation [Kopco, Best, and Carlile (2010). J. Acoust. Soc. Am. 127, 1450–1457]. Here, an auditory model for solving this task is introduced. The model has three processing stages: (1) extraction of the instantaneous interaural time difference (ITD) information, (2) selection of target-related ITD information (“glimpses”) using a template-matching procedure based on periodicity, spectral energy, or both, and (3) target location estimation. The model performance was compared to the human data, and to the performance of a modified model using an ideal binary mask (IBM) at stage (2). The IBM-based model performed similarly to the subjects, indicating that the binaural model is able to accurately estimate source locations. Template matching using spectral energy and using a combination of spectral energy and periodicity achieved good results, while using periodicity alone led to poor results. Particularly, the glimpses extracted from the initial portion of the signal were critical for good performance. Simulation data show that the auditory features investigated here are sufficient to explain human performance in this challenging listening condition and thus may be used in models of auditory scene analysis.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5daa7720be1298e6eb2a83e335ac4971 https://pubmed.ncbi.nlm.nih.gov/27250183 Zobrazit plný text záznamu