Autor: |
Hosseini, Matin, Maida, Anthony, Hosseini, Seyedmajid, Gottumukkala, Raju |
Zdroj: |
SN Computer Science; January 2023, Vol. 4 Issue: 1 |
Abstrakt: |
Video frame prediction is needed for various computer-vision-based systems such as self-driving vehicles and video streaming. This paper proposes a novel Inception-based convolutional recurrent neural network (RNN) as an enhancement to a basic gated convolutional RNN. A basic gated convolutional RNN has fixed-size kernels that are hyperparameters of the network. Our model replaces the single-size kernel in the convolutional RNN with Inception-like multi-channel kernels. Multiple kernel sizes allow the capturing of spatio-temporal dynamics of multiple objects in the video compared to one single-sized kernel. Our model is tested within a predictive coding framework to improve video frame prediction. We seek to determine whether multi-kernel convolutional gated RNNs improve performance compared to basic convolutional RNNs. We study different variants of the proposed multi-kernel convolutional RNNs, namely LSTM and GRU, with both Inception V1 and Inception V2 configurations. We observe that video frame prediction offers improved performance compared to existing PredNet-based video prediction methods, but with minor additional cost in training time. |
Databáze: |
Supplemental Index |
Externí odkaz: |
|