Multi-task learning DNN to improve gender identification from speech leveraging age information of the speaker

Autor: Mousmita Sarma, Nagendra Kumar Goel, Kandarpa Kumar Sarma
Rok vydání: 2020
Předmět:
Zdroj: International Journal of Speech Technology. 23:223-240
ISSN: 1572-8110
1381-2416
DOI: 10.1007/s10772-020-09680-4
Popis: We propose a method which provides age of the speaker as an additional information while training a machine learning model for gender identification. To achieve this objective, we design a multi-task learning Deep Neural Network (DNN) model where the primary output layer has the speakers’ gender as target. Further, we use age group of the speaker as auxiliary target for each utterance, where age groups are created considering the gender of the speaker. We experimentally prove that multi-task learning DNN outperforms Gaussian Mixture Model (GMM) or single-task learning DNN trained only for gender recognition for more real life oriented datasets. For such datasets we have recordings of speakers’ from all age groups (children to seniors). We use raw speech waveform as input to our DNN which executes the multi-task learning with the freedom to follow gender and age discriminative features during training. The raw waveform front end uses convolutional layer based filter learning. Further, we use Long Short Term Memory cell based recurrent projection (LSTMP) layers for modeling temporal dynamics of speech from learned feature representation.
Databáze: OpenAIRE