A supervised deep convolutional based bidirectional long short term memory video hashing for large scale video retrieval applications
Autor: | H. Srimathi, R. Anuranji |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
business.industry Applied Mathematics Deep learning Dimensionality reduction Hash function 020206 networking & telecommunications Pattern recognition 02 engineering and technology Computational Theory and Mathematics Discriminative model Artificial Intelligence Signal Processing Scalability 0202 electrical engineering electronic engineering information engineering Feature (machine learning) 020201 artificial intelligence & image processing Binary code Computer Vision and Pattern Recognition Artificial intelligence Electrical and Electronic Engineering Statistics Probability and Uncertainty business Network model |
Zdroj: | Digital Signal Processing. 102:102729 |
ISSN: | 1051-2004 |
DOI: | 10.1016/j.dsp.2020.102729 |
Popis: | Recently, large scale video content retrieval has gained more attention due to a large amount of user-generated images and video content available over the internet. Hashing is one of the effective techniques that encode the high dimensional features vectors into compact binary codes. The aim of hashing is to generate the short binary codes and map the similar hash code values to retrieve similar video from the database with minimum distance measure. Deep learning-based hashing networks are employed to learn the representative video feature to estimate the hash functions. However, the existing hashing approaches fail due to the frame-level feature representation and does not well exploits the effective temporal features in visual search. Furthermore, significant loss of features during the dimensionality reduction step causes low accuracy. Hence, it is essential to develop a deep learning-based hashing framework that should exploits both the strong spatial and temporal features for the scalable video search. The main objective is to learn the high dimensional features from the entire video and derive compact binary codes to retrieve the similar videos for the input query sequence. In this paper, we propose a joint network model of supervised stacked heterogeneous convolutional multi-kernel (Stacked HetConv-MK)-bidirectional Long Short Term Memory (BiDLSTM) network model that effectively encodes the rich structural as well as the discriminative features from the video sequence to estimate the compact binary codes. Initially, the video frames are passed to the stacked convolution networks with heterogeneous convolutional kernel size and residual learning to extract the spatial features at different views from the video sequences and to improve the learning efficiency. Then, the bidirectional network computes the sequence in both forward and backward directions and obtains the series of hidden state output. Finally, the fully connected structure with an activation unit performs hashing to learn the multiple codes for each video. Experimental analysis is performed on three datasets and the result shows a better accuracy measure than the other state-of-art approaches. |
Databáze: | OpenAIRE |
Externí odkaz: |