Estimating fundamental frequency and formants based on periodicity glimpses: a deep learning approach
Autor: | Joanna Luberadzka, Hendrik Kayser, Volker Hohmann |
---|---|
Rok vydání: | 2020 |
Předmět: |
Artificial neural network
business.industry Computer science Deep learning Feature vector Speech recognition Acoustic space 03 medical and health sciences 0302 clinical medicine Formant medicine.anatomical_structure Video tracking medicine Auditory system Artificial intelligence 030223 otorhinolaryngology Set (psychology) business 030217 neurology & neurosurgery |
Zdroj: | ICHI |
DOI: | 10.1109/ichi48887.2020.9374386 |
Popis: | Despite many technological advances, hearing aids still amplify the background sounds together with the signal of interest. To understand how to process the acoustic information in an optimal way for a human listener, we have to understand why a healthy auditory system performs this task with such a great efficiency. Several studies show the importance of the so called auditory glimpses in decoding of the auditory scene. They are usually defined as time-frequency bins dominated by one source, which the auditory system may use to track this source in a crowded acoustic space. Josupeit et al. in [6]-[8] developed an algorithm inspired by these findings. It extracts the speech glimpses, defined as the salient tonal components of a sound mixture, called the sparse periodicity-based auditory features (sPAF).In this study, we investigated if sPAF can be used to estimate the instantaneous voice parameters: fundamental frequency F0 and formant frequencies F1 and F2. We used a supervised machine learning technique for finding the mapping between parameter and feature space. Using a formant synthesizer, we created a labeled data set containing instantaneous sPAF and the corresponding parameter values. We trained a deep neural network and evaluated the prediction performance of the learned model. The results showed that the sPAF represent the parameters of a single voice very well, which opens a possibility to use the sPAF for more complex scenarios of auditory object tracking. |
Databáze: | OpenAIRE |
Externí odkaz: |