Hybrid Multidimensional Deep Convolutional Neural Network for Multimodal Fusion

Autor: Olena Vynokurova, Dmytro Peleshko
Rok vydání: 2020
Předmět:
Zdroj: 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP).
DOI: 10.1109/dsmp47368.2020.9204215
Popis: The Hybrid Multidimensional Deep Convolutional Neural Network (HMDCNN) topology for the multimodal recognition of the speech, the face, the lips, and human gestures behavior is proposed. In this case a hybridization is understood to be compatible use of 2D and 3D convolutional neural networks in one multimodal architecture. Conducted researches relate to improving the understanding of complex dynamic scenes. The basic unit of the proposed hybrid system is deep neural network topology, which combines 2D and 3D convolutional neural network (CNN) for each modality with proposed intermediate-level feature fusion subsystem. Such a feature map fusion method is based on scaling procedure with a specific combination of pooling operation with non-square kernels and allows merging different type of modalities.
Databáze: OpenAIRE