Automatic classification of infant vocalization sequences with convolutional neural networks

Autor: Mario Hlawitschka, Mirco Fuchs, Franz Anders
Rok vydání: 2020
Předmět:
Zdroj: Speech Communication. 119:36-45
ISSN: 0167-6393
DOI: 10.1016/j.specom.2020.03.003
Popis: In this study we investigated Convolutional Neural Networks (CNNs) for the classification of infant vocalization sequences. The target classes were ‘crying’, ‘fussing’, ‘babbling’, ‘laughing’ and ‘vegetative vocalizations’. The general case of this classification task is of importance for applications which require a qualitative evaluation of general infant vocalizations, such as pain assessment or assessment of language acquisition. The classification procedure was based on representing audio segments as spectrograms which are input to an conventional CNN architecture scheme. We systematically analyzed the influence of network features on the classification performance to derive guidelines for designing effective CNN architectures for the task. We show that CNNs should be modeled to have a small bottleneck between the convolutional stage and the fully connected stage, achieved through broad aggregation of convolutional feature maps across the time and frequency axis. The best performing CNN configuration yielded a balanced accuracy of 72%. We conclude that conventional CNN architectures can reach satisfactory performance for this task even with small amounts of training data as long as certain network features are ensured.
Databáze: OpenAIRE