Estimating fundamental frequency and formants based on periodicity glimpses: a deep learning approach

Autor: Joanna Luberadzka, Hendrik Kayser, Volker Hohmann
Rok vydání: 2020
Předmět:
Zdroj: ICHI
DOI: 10.1109/ichi48887.2020.9374386
Popis: Despite many technological advances, hearing aids still amplify the background sounds together with the signal of interest. To understand how to process the acoustic information in an optimal way for a human listener, we have to understand why a healthy auditory system performs this task with such a great efficiency. Several studies show the importance of the so called auditory glimpses in decoding of the auditory scene. They are usually defined as time-frequency bins dominated by one source, which the auditory system may use to track this source in a crowded acoustic space. Josupeit et al. in [6]-[8] developed an algorithm inspired by these findings. It extracts the speech glimpses, defined as the salient tonal components of a sound mixture, called the sparse periodicity-based auditory features (sPAF).In this study, we investigated if sPAF can be used to estimate the instantaneous voice parameters: fundamental frequency F0 and formant frequencies F1 and F2. We used a supervised machine learning technique for finding the mapping between parameter and feature space. Using a formant synthesizer, we created a labeled data set containing instantaneous sPAF and the corresponding parameter values. We trained a deep neural network and evaluated the prediction performance of the learned model. The results showed that the sPAF represent the parameters of a single voice very well, which opens a possibility to use the sPAF for more complex scenarios of auditory object tracking.
Databáze: OpenAIRE