RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild

Autor:	Rafael Muñoz-Salinas, Manuel J. Marín-Jiménez, Rafael Berral-Soler, F. J. Madrid-Cuevas
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences 0209 industrial biotechnology Computer Science - Artificial Intelligence Computer science business.industry Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 02 engineering and technology Convolutional neural network 020901 industrial engineering & automation Artificial Intelligence (cs.AI) Artificial Intelligence Robustness (computer science) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Computer vision Artificial intelligence business Pose Software
DOI:	10.48550/arxiv.2011.01890
Popis:	Human head pose estimation in images has applications in many fields such as human-computer interaction or video surveillance tasks. In this work, we address this problem, defined here as the estimation of both vertical (tilt/pitch) and horizontal (pan/yaw) angles, through the use of a single Convolutional Neural Network (ConvNet) model, trying to balance precision and inference speed in order to maximize its usability in real-world applications. Our model is trained over the combination of two datasets: 'Pointing'04' (aiming at covering a wide range of poses) and 'Annotated Facial Landmarks in the Wild' (in order to improve robustness of our model for its use on real-world images). Three different partitions of the combined dataset are defined and used for training, validation and testing purposes. As a result of this work, we have obtained a trained ConvNet model, coined RealHePoNet, that given a low-resolution grayscale input image, and without the need of using facial landmarks, is able to estimate with low error both tilt and pan angles (~4.4{\deg} average error on the test partition). Also, given its low inference time (~6 ms per head), we consider our model usable even when paired with medium-spec hardware (i.e. GTX 1060 GPU). * Code available at: https://github.com/rafabs97/headpose_final * Demo video at: https://www.youtube.com/watch?v=2UeuXh5DjAE Comment: Accepted for publication at Neural Computing and Applications
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::43f66f06a5fd58ae93c1c7b9f42747b8 Zobrazit plný text záznamu