Investigating the effects of unbalanced predictors on identifying discriminatory predictors

Autor: Hung, Tzu-Han, 洪子涵
Rok vydání: 2018
Druh dokumentu: 學位論文 ; thesis
Popis: 106
One key challenge for the data mining community is the problem of data imbalance. While the vast majority of research focuses on the outcome class imbalance problem, this research investigates another type of data imbalance: the predictor imbalance, a problem that would lead to discrimination against minority groups in an automated decision making process. In this research, we examine the effects of predictor imbalance on classification trees and logistic regression. We posit that unbalanced predictors are likely to be ignored in impurity based trees and some statistic-based trees even when they can perfectly classify the observations in rare subgroups (e.g., rare human races, diseases, etc.). Ignoring such an unbalanced predictor may lead to unjust decision making. This is particularly an issue nowadays as many managerial decisions are driven by AI and data analytical outputs. Guidelines to detect and address discrimination based on unbalanced predictors are also provided.
Databáze: Networked Digital Library of Theses & Dissertations