Dataset bias exposed in face verification

Autor:	Carlos V. Regueiro, Roberto Iglesias, Xosé M. Pardo, Eric Lopez-Lopez, Fernando E. Casado
Přispěvatelé:	Universidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Información, Universidade de Santiago de Compostela. Departamento de Electrónica e Computación
Rok vydání:	2019
Předmět:	Independent and identically distributed random variables Learning (artificial intelligence) Facial images Computer science Feature vector 0211 other engineering and technologies 02 engineering and technology Machine learning computer.software_genre Facial recognition system Domain (software engineering) Facial verification methods Face verification 0202 electrical engineering electronic engineering information engineering Public available datasets Target domain Face recognition Representation (mathematics) 021110 strategic defence & security studies Source domain business.industry Exogenous factor Signal Processing Mobile devices 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Artificial intelligence business computer Mobile device Software
Zdroj:	Minerva: Repositorio Institucional de la Universidad de Santiago de Compostela Universidad de Santiago de Compostela (USC) Minerva. Repositorio Institucional de la Universidad de Santiago de Compostela instname
ISSN:	2047-4946 2047-4938
DOI:	10.1049/iet-bmt.2018.5224
Popis:	This is the peer reviewed version of the following article: López‐López, E., Pardo, X.M., Regueiro, C.V., Iglesias, R. and Casado, F.E. (2019), Dataset bias exposed in face verification. IET Biom., 8: 249-258, which has been published in final form at https://doi.org/10.1049/iet-bmt.2018.5224. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions Most facial verification methods assume that training and testing sets contain independent and identically distributed samples, although, in many real applications, this assumption does not hold. Whenever gathering a representative dataset in the target domain is unfeasible, it is necessary to choose one of the already available (source domain) datasets. Here, a study was performed over the differences among six public datasets, and how this impacts on the performance of the learned methods. In the considered scenario of mobile devices, the individual of interest is enrolled using a few facial images taken in the operational domain, while training impostors are drawn from one of the public available datasets. This work tried to shed light on the inherent differences among the datasets, and potential harms that should be considered when they are combined for training and testing. Results indicate that a drop in performance occurs whenever training and testing are done on different datasets compared to the case of using the same dataset in both phases. However, the decay strongly depends on the kind of features. Besides, the representation of samples in the feature space reveals insights into what extent bias is an endogenous or an exogenous factor This work has received financial support from the Xunta de Galicia, Consellería de Cultura, Educación e Ordenación Universitaria (Accreditation 2016–2019, EDG431G/01 and ED431G/08, and reference competitive group 2014–2017, GRC2014/030), the European Union: European Social Fund (ESF), European Regional Development Fund (ERDF) and FEDER funds and (AEI/FEDER, UE) grant number TIN2017‐90135‐R. Eric López had received financial support from the Xunta de Galicia and the European Union (European Social Fund ‐ ESF) SI
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::faa8e57dbd60b177f5f4d7af6ea211b4 https://doi.org/10.1049/iet-bmt.2018.5224 Zobrazit plný text záznamu Plný text