Static video summarization using multi-CNN with sparse autoencoder and random forest classifier

Autor: Jesna Mohan, Madhu S. Nair
Rok vydání: 2020
Předmět:
Zdroj: Signal, Image and Video Processing. 15:735-742
ISSN: 1863-1711
1863-1703
DOI: 10.1007/s11760-020-01791-4
Popis: A summarization system detects the parts of the input video that contain an essential message. Such a system aims to generate a very compact and meaningful representation of the original video. A novel method to detect key-frames for static summarization is presented in this paper. The method detects key-frames based on feature vectors extracted from multiple pre-trained Convolutional Neural Network models (Multi-CNN). The features are extracted using four pre-trained models of CNN. These vectors are fed to Sparse Autoencoder, which outputs a combined representation of the input feature vectors. The key-frames of input video are extracted based on combined feature vectors using Random Forest Classifier. The evaluation of the method is done using two datasets: VSUMM and OVP, based on user summaries present in the ground-truth. The method was able to achieve an average F-score of 0.83 on VSUMM dataset and 0.82 on OVP dataset, respectively. The method attained promising results compared to other state-of-the-art methods in the literature. Multi-CNN model was also able to generate high-quality summaries consistently from videos of all categories. Further experiments prove that Multi-CNN model in combination with Random Forest classifier performs better than other classifiers considered in the study.
Databáze: OpenAIRE