Automatic CNN-Based Enhancement of 360° Video Experience With Multisensorial Effects
Autor: | Gabriel-Miro Muntean, Anderson Augusto Simiscuka, John Patrick Sexton, Kevin McGuinness |
---|---|
Jazyk: | angličtina |
Předmět: |
General Computer Science
business.industry Computer science 020209 energy General Engineering Latency (audio) 02 engineering and technology Virtual reality Convolutional neural network Synchronization Cube mapping 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing General Materials Science Computer vision Timestamp Artificial intelligence business Digital signal processing Haptic technology |
Zdroj: | IEEE Access. 9:133156-133169 |
ISSN: | 2169-3536 |
DOI: | 10.1109/access.2021.3115701 |
Popis: | High-resolution audio-visual virtual reality (VR) technologies currently offer satisfying experiences for both sight and hearing senses in the world of multimedia. However, the delivery of truly immersive experiences requires the incorporation of other senses such as touch and smell. Multisensorial effects are usually manually synchronized with videos and data is stored in companion files, which contain timestamps for these effects. This manual task becomes very complex for 360° videos, as the scenes triggering effects can occur in different viewpoints. The solution proposed in this paper aims to automatically add extra sensory information to immersive 360° videos. A novel scent prediction scheme using Convolutional Neural Networks (CNN) is proposed to perform scene predictions on 360° videos represented in the Equi-Angular Cubemap format in order to add scents relevant to the detected content. Digital signal processing is used to detect loud sounds in the video with a Root Mean Square (RMS) function, which are then associated with haptic feedback. A prototype was developed, which outputs multisensorial stimuli by using an olfaction dispenser and a haptic mouse. The proposed solution has been tested and it achieved excellent results in terms of accuracy of scene detection, olfaction latency and correct execution of the relevant effects. Different CNN architectures, including AlexNet, ResNet18 and ResNet50, were also assessed comparatively, achieving a labeling accuracy of up to 72.67% for olfaction-enhanced media. |
Databáze: | OpenAIRE |
Externí odkaz: |