Age Group Classification with Speech and Metadata Multimodality Fusion
Autor: | Denys Katerenchuk |
---|---|
Rok vydání: | 2017 |
Předmět: |
FOS: Computer and information sciences
Sound (cs.SD) Computer science computer.software_genre Computer Science - Sound 050105 experimental psychology Task (project management) Multimodality 03 medical and health sciences 0302 clinical medicine Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering 0501 psychology and cognitive sciences Baseline (configuration management) Computer Science - Computation and Language business.industry Group (mathematics) 05 social sciences Metadata Artificial intelligence business Computation and Language (cs.CL) computer 030217 neurology & neurosurgery Natural language processing Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | EACL (2) |
DOI: | 10.18653/v1/e17-2030 |
Popis: | Children comprise a significant proportion of TV viewers and it is worthwhile to customize the experience for them. However, identifying who is a child in the audience can be a challenging task. Identifying gender and age from audio commands is a well-studied problem but is still very challenging to get good accuracy when the utterances are typically only a couple of seconds long. We present initial studies of a novel method which combines utterances with user metadata. In particular, we develop an ensemble of different machine learning techniques on different subsets of data to improve child detection. Our initial results show a 9.2\% absolute improvement over the baseline, leading to a state-of-the-art performance. |
Databáze: | OpenAIRE |
Externí odkaz: |