Multi-task learning DNN to improve gender identification from speech leveraging age information of the speaker
Autor: | Mousmita Sarma, Nagendra Kumar Goel, Kandarpa Kumar Sarma |
---|---|
Rok vydání: | 2020 |
Předmět: |
Linguistics and Language
Artificial neural network Computer science Speech recognition Multi-task learning Filter (signal processing) Mixture model Language and Linguistics Human-Computer Interaction Identification (information) Discriminative model Feature (machine learning) Computer Vision and Pattern Recognition Projection (set theory) Software |
Zdroj: | International Journal of Speech Technology. 23:223-240 |
ISSN: | 1572-8110 1381-2416 |
DOI: | 10.1007/s10772-020-09680-4 |
Popis: | We propose a method which provides age of the speaker as an additional information while training a machine learning model for gender identification. To achieve this objective, we design a multi-task learning Deep Neural Network (DNN) model where the primary output layer has the speakers’ gender as target. Further, we use age group of the speaker as auxiliary target for each utterance, where age groups are created considering the gender of the speaker. We experimentally prove that multi-task learning DNN outperforms Gaussian Mixture Model (GMM) or single-task learning DNN trained only for gender recognition for more real life oriented datasets. For such datasets we have recordings of speakers’ from all age groups (children to seniors). We use raw speech waveform as input to our DNN which executes the multi-task learning with the freedom to follow gender and age discriminative features during training. The raw waveform front end uses convolutional layer based filter learning. Further, we use Long Short Term Memory cell based recurrent projection (LSTMP) layers for modeling temporal dynamics of speech from learned feature representation. |
Databáze: | OpenAIRE |
Externí odkaz: |