Optimizing Fixation Prediction Using Recurrent Neural Networks for 360$^{\circ }$ Video Streaming in Head-Mounted Virtual Reality
Autor: | Chun-Ying Huang, Cheng-Hsin Hsu, Shou-Cheng Yen, Ching-Ling Fan |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
Feature extraction 02 engineering and technology Virtual reality Video quality Object detection Computer Science Applications Recurrent neural network Distortion Signal Processing Fixation (visual) 0202 electrical engineering electronic engineering information engineering Media Technology Cellular network 020201 artificial intelligence & image processing Electrical and Electronic Engineering Face detection Algorithm |
Zdroj: | IEEE Transactions on Multimedia. 22:744-759 |
ISSN: | 1941-0077 1520-9210 |
DOI: | 10.1109/tmm.2019.2931807 |
Popis: | We study the problem of predicting the viewing probability of different parts of $3\text{60}^{\circ }$ videos when streaming them to head-mounted displays. We propose a fixation prediction network based on recurrent neural network, which leverages sensor and content features. The content features are derived by computer vision (CV) algorithms, which may suffer from inferior performance due to various types of distortion caused by diverse $\text{360}^{\circ }$ video projection models. We propose a unified approach with overlapping virtual viewports to eliminate such negative effects, and we evaluate our proposed solution using several CV algorithms, such as saliency detection, face detection, and object detection. We find that overlapping virtual viewports increase the performance of these existing CV algorithms that were not trained for $\text{360}^{\circ }$ videos. We next fine-tune our fixation prediction network with diverse design options, including: 1) with or without overlapping virtual viewports, 2) with or without future content features, and 3) different feature sampling rates. We empirically choose the best fixation prediction network and use it in a $\text{360}^{\circ }$ video streaming system. We conduct extensive trace-driven simulations with a large-scale dataset to quantify the performance of the $\text{360}^{\circ }$ video streaming system with different fixation prediction algorithms. The results show that our proposed fixation prediction network outperforms other algorithms in several aspects, such as: 1) achieving comparable video quality (average gaps between −0.05 and 0.92 dB), 2) consuming much less bandwidth (average bandwidth reduction by up to 8 Mb/s), 3) reducing the rebuffering time (on average 40 s in bandwidth-limited 4G cellular networks), and 4) running in real-time (at most 124 ms). |
Databáze: | OpenAIRE |
Externí odkaz: |