Metric-DST: Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning

Autor:	Tepeli, Yasin I., de Wolf, Mathijs, Gonçalves, Joana P.
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning Computer Science - Artificial Intelligence
Druh dokumentu:	Working Paper
Popis:	Selection bias poses a critical challenge for fairness in machine learning, as models trained on data that is less representative of the population might exhibit undesirable behavior for underrepresented profiles. Semi-supervised learning strategies like self-training can mitigate selection bias by incorporating unlabeled data into model training to gain further insight into the distribution of the population. However, conventional self-training seeks to include high-confidence data samples, which may reinforce existing model bias and compromise effectiveness. We propose Metric-DST, a diversity-guided self-training strategy that leverages metric learning and its implicit embedding space to counter confidence-based bias through the inclusion of more diverse samples. Metric-DST learned more robust models in the presence of selection bias for generated and real-world datasets with induced bias, as well as a molecular biology prediction task with intrinsic bias. The Metric-DST learning strategy offers a flexible and widely applicable solution to mitigate selection bias and enhance fairness of machine learning models. Comment: 18 pages main manuscript (4 main figures), 7 pages of supplementary
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2411.18442 Zobrazit plný text záznamu View this record from Arxiv