3D deformable convolution for action classification in videos

Autor: Phooi Yee Lau, Hung-Khoon Tan, Siew Cheng Lai
Rok vydání: 2021
Předmět:
Zdroj: International Workshop on Advanced Imaging Technology (IWAIT) 2021.
DOI: 10.1117/12.2591088
Popis: Action recognition is one of the popular research areas in computer vision because it can be applied to solve many problems especially in security surveillance, behavior analysis, healthcare and so on. Some of the well-known Convolutional Neural Network (CNN) in action classification using 3D convolution are C3D, I3D and R(2 + 1)D. These 3D CNNs assume that the spatial and temporal dimensions of motion are uniform where the 3D filters are uniformly shaped. However, the path in motion can be in any directions and a uniform shape filter might not be able to capture nonuniform spatial motion and this limits the performance of the classification. To address the above problem, we incorporate a 3D deformable filter in a C3D network for action classification. The deformable convolution adds offsets to the regular grid sampling locations in the standard convolution resulting in non-uniform sampling location. We will also investigate the performance of the network when apply the 3D deformable convolution in different layers and the effect of different dilation size of the 3D deformable filter. UCF101 dataset is used in the experiments. From our experiments, it is found that applying the deformable convolution in lower layer yield better result compare to other layers. Our experiment shows that if we put the deformable convolution in Conv1a, the accuracy achieved is 48.50%.
Databáze: OpenAIRE