Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in Video Face Recognition
Autor: | George Ekladious, Eric Granger, Hugo Lemoine, Salim Moudache, Kaveh Kamali |
---|---|
Rok vydání: | 2020 |
Předmět: |
021110 strategic
defence & security studies business.industry Computer science Deep learning ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 0211 other engineering and technologies Video camera 02 engineering and technology Facial recognition system law.invention Domain (software engineering) law Metric (mathematics) 0202 electrical engineering electronic engineering information engineering Embedding 020201 artificial intelligence & image processing Computer vision Artificial intelligence business Camera resectioning |
Zdroj: | IJCNN |
DOI: | 10.1109/ijcnn48605.2020.9206794 |
Popis: | The scalability and complexity of deep learning models remains a key issue in many of visual recognition applications. For instance, in video surveillance, fine tuning of a model with labeled image data from each new camera is required to reduce the domain shift between videos captured from the source domain (laboratory setting) and the target domain (operational environment). In many video surveillance applications, like face recognition and person re-identification, a pair-wise matcher is typically employed to assign a query image captured using a video camera to the corresponding reference images in a gallery. The different configuration, viewpoint, and operational conditions of each camera can introduce significant shifts in pair-wise distance distributions, resulting in a decline in recognition performance for new cameras. In this paper, a new deep domain adaptation (DA) method is proposed to adapt the CNN embedding of a Siamese network using unlabeled tracklets captured with a new video camera. To this end, a dual-triplet loss is introduced for metric learning, where two triplets are constructed using video data from a source camera, and a new target camera. In order to constitute the dual triplets, a mutual-supervised learning approach is introduced where the source camera acts as a teacher, providing the target camera with an initial embedding. Then, the student relies on the teacher to iteratively label the positive and negative pairs collected during, e.g., initial camera calibration. Both source and target embeddings continue to simultaneously learn such that their pair-wise distance distributions become aligned. For validation, the proposed metric learning technique is used to train deep Siamese networks under different training scenarios, and is compared to state-of-the-art techniques for still-to-video FR on the COXS2V and a private video-based FR dataset. Results indicate that the proposed method can provide a level of accuracy that is comparable to the upper bound performance, in training scenario where labeled target data is employed to fine-tune the Siamese network. |
Databáze: | OpenAIRE |
Externí odkaz: |