Latent semantic learning with time-series cross correlation analysis for video scene detection and classification
Autor: | Jui-Yuan Su, Kuei-Fang Hsiao, Habib F. Rashvand, Shyi-Chyi Cheng |
---|---|
Rok vydání: | 2015 |
Předmět: |
Computer Networks and Communications
Computer science business.industry ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Codebook 020207 software engineering Pattern recognition 02 engineering and technology Support vector machine Discriminative model Hardware and Architecture Gesture recognition String kernel 0202 electrical engineering electronic engineering information engineering Media Technology Semantic learning 020201 artificial intelligence & image processing Computer vision Artificial intelligence Projection (set theory) business Cluster analysis Software Gesture |
Zdroj: | Multimedia Tools and Applications. 75:12919-12940 |
ISSN: | 1573-7721 1380-7501 |
DOI: | 10.1007/s11042-015-2548-y |
Popis: | This paper presents a novel, latent semantic learning method based on the proposed time-series cross correlation analysis for extracting a discriminative dynamic scene model to address the recognition problems of video event recognition and 3D human body gesture. Typical dynamic texture analysis poses the problems of modeling, learning, recognizing and synthesizing the images of dynamic scenes based on the autoregressive moving average (ARMA) model. Instead of applying the ARMA approach to capture the temporal structure of video sequences, this algorithm uses the learned dynamic scene model to semantically transform video sequences into multiple scenes with a lower computational effort. Therefore, to generate a discriminative dynamic scene model with space-time information preserved is crucial for the success of the proposed latent semantic learning. To achieve the goal, the k-medoids clustering with appearance distance metrics first used to partition all frames of training video sequences, regardless of their scene types, to provide an initial key-frame codebook. To discover the temporal structure of the dynamic scene model, we develop a time-series cross correlation analysis (TSCCA) to the latent semantic learning, with an alternating dynamic programing (ADP) to embed the time relationship between the training images into the dynamic scene model. We also tackle the problem of dynamic programming, which is supposed to produce large temporal misalignment for periodic activities. Moreover, the discriminative power of the model is estimated by a deterministic projection-based learning algorithm. Finally, based on the learned dynamic scene model, this paper uses a support vector machine (SVM) with a two-channel string kernel for video scene classification. Two test datasets, one for video event classification and the other for 3D human body gesture recognition, are used to verify the effectiveness of the proposed approach. Experimental results demonstrate that the proposed algorithm obtains good performance in terms of classification accuracy. |
Databáze: | OpenAIRE |
Externí odkaz: |