Affective interaction recognition using spatio-temporal features and context
Autor: | Jinglian Liang, Chao Xu, Zhiyong Feng, Xirong Ma |
---|---|
Rok vydání: | 2016 |
Předmět: |
Exploit
Computer science ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 02 engineering and technology ENCODE computer.software_genre Feature description 03 medical and health sciences 0302 clinical medicine Signal Processing 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Data mining Spatial relationship Affective computing Cluster analysis computer Temporal information 030217 neurology & neurosurgery Software Coding (social sciences) |
Zdroj: | Computer Vision and Image Understanding. 144:155-165 |
ISSN: | 1077-3142 |
DOI: | 10.1016/j.cviu.2015.10.008 |
Popis: | A hierarchical representation structure for interaction recognition is introduced.We adopt hierarchical coding models to encode low-level features.A segmental clustering method is applied to extract the mid-level features.Contextual information is incorporated with motion features by extracting the interactive contours.We demonstrate empirical results on three datasets. This paper focuses on recognizing the human interaction relative to human emotion, and addresses the problem of interaction features representation. We propose a two-layer feature description structure that exploits the representation of spatio-temporal motion features and context features hierarchically. On the lower layer, the local features for motion and interactive context are extracted respectively. We first characterize the local spatio-temporal trajectories as the motion features. Instead of hand-crafted features, a new hierarchical spatio-temporal trajectory coding model is presented to learn and represent the local spatio-temporal trajectories. To further exploit the spatial and temporal relationships in the interactive activities, we then propose an interactive context descriptor, which extracts the local interactive contours from frames. These contours implicitly incorporate the contextual spatial and temporal information. On the higher layer, semi-global features are represented based on the local features encoded on the lower layer. And a spatio-temporal segment clustering method is designed for features extraction on this layer. This method takes the spatial relationship and temporal order of local features into account and creates the mid-level motion features and mid-level context features. Experiments on three challenging action datasets in video, including HMDB51, Hollywood2 and UT-Interaction, are conducted. The results demonstrate the efficacy of the proposed structure, and validate the effectiveness of the proposed context descriptor. |
Databáze: | OpenAIRE |
Externí odkaz: |