Abstrakt: |
Convolutional Neural Networks (CNNs) have been receiving research attention for Stereoscopic Video Quality Assessment (SVQA) in recent years. Recently, researchers have used 3D CNNs for extracting useful spatial and temporal features from stereo videos and have used them for detecting the reduction in the quality of the stereoscopic videos. To our best knowledge, the concept of transfer learning (TL) has not been well-examined in SVQA. Pretraining and fine-tuning are approaches used in deep neural networks to transform the knowledge learned from other general fields. The previous methods that utilized TL used very heavy 3D ResNet architectures with several layers; therefore, they are very time-consuming. In this paper, we develop a new model for SVQA and use the Inflated 3-Dimensional ConvNet (I3D) network as the backbone feature extractor for our model. We first apply left and right videos to I3D models to extract their features. Then, we apply 3D CNNs to learn quality-aware features from stereo videos. We evaluate our proposed method using LFOVIAS3DPh2 and NAMA3DS1- COSPAD1 SVQA datasets. Extensive experimental studies on two datasets prove that the proposed method correlates with the subjective results. The Root-Mean-Square Error (RMSE) for the NAMA3DS1-COSPAD1 dataset is 0.2454, and the high amount of Linear Correlation Coefficient (LCC) and Spearmen Rank Order Correlation Coefficient (SROCC) values (0.895 and 0.901 respectively) for LFOVIAS3DPh2 dataset show the compatibility of the results with human visual system (HVS). Despite having lighter architecture than the best performing method, the proposed method outperforms most of the methods and overall it is the second best performing method available. |