Popis: |
Police analysts increasingly use data analysis techniques to make decisions that have an impact on society. Previous research shows that excluding ethically sensitive information (features) such as name, surname, address etc. during the data analysis process has implications for accuracy and decision-making, which may have negative consequences affecting individuals or a group within society. To assess whether the use of ethically sensitive features has implications for decision-making, we identified two important aspects: (i) transparency of the feature selection, and (ii) a way of assessing the impact of the selected features. In this paper, we define ethically sensitive information from two aspects: (a) features that identify an individual, known as personally identifiable information, and (b) sensitive features that discriminate against the individual, known as prejudice information. We investigate whether the selection of these features has an impact on accurately identifying co-offenders. For this, we propose a privacy scale, which consists of a value for each feature depending on the label of their sensitivity. To explore this, we used an anonymized dataset received from a UK law enforcement agency. Ground truths samples with known co-offender were selected for this study. We used the clustering algorithm K-MODE and included and excluded features that included personal, prejudice and other attributes to assess the relationship between the privacy score of the combined input attributes and the accuracy of the clustering. The results suggest that the use of ethically sensitive features does have an impact on correctly identifying potential co-offenders more accurately. |