Efficient face detection and tracking in video sequences based on deep learning
Autor: | Guangyong Zheng, Yuming Xu |
---|---|
Rok vydání: | 2021 |
Předmět: |
Information Systems and Management
Channel (digital image) Facial motion capture Computer science ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Initialization 02 engineering and technology Feature scaling Theoretical Computer Science Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Computer vision Face detection business.industry Deep learning 05 social sciences Frame (networking) 050301 education Computer Science Applications Control and Systems Engineering Face (geometry) 020201 artificial intelligence & image processing Artificial intelligence business 0503 education Software |
Zdroj: | Information Sciences. 568:265-285 |
ISSN: | 0020-0255 |
Popis: | Video-based face detection and tracking technology has been widely used in video surveillance, safe driving, and medical diagnosis. In video sequences, most existing face detection and tracking methods face interference caused by occlusion, ambient illumination, and changes in human posture. To accurately track human faces in video sequences, we propose an efficient face detection and tracking framework based on deep learning, which includes a SENResNet face detection model and a Regression Network-based Face Tracking (RNFT) model. Firstly, the SENResNet model integrates the Squeeze and Excitation Network (SEN) with the Residual Neural Network (ResNet). To solve the problem that deep neural networks are difficult to train, we use ResNet to overcome the problem of gradient disappearance in deep network training. To fuse the features of each channel during the convolution operation, we further integrate the SEN module into the SENResNet model. SENResNet accurately detects facial information in each frame and extracts the position of the target face, thereby providing an initialization window for face tracking. Then, the RNFT model extracts facial features from adjacent frames and predict the position of the target face in the next frame. To address the problem of feature scaling, we add a correction network to the RNFT model. The improved RNFT model extracts the rectangular frame of the target face in the previous frame and strengthens the perception of feature scaling, thereby improving its accuracy. Extensive experimental results on public facial and video datasets show that the proposed SENResNet and RNFT models are superior to the state-of-the-art comparison methods in terms of accuracy and performance. |
Databáze: | OpenAIRE |
Externí odkaz: |