Utility-Embraced Microaggregation for Machine Learning Applications

Autor:	Soobin Lee, Won-Yong Shin
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	Clustering data utility dimensionality reduction k-anonymity microaggregation Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 10, Pp 64535-64546 (2022)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2022.3183201
Popis:	With access to vast amounts of data, privacy protection is more important than ever. Among various de-identification (anonymization) techniques, $k$ -anonymous microaggregation has been widely studied since it enables us to balance between confidentiality and data utility. Despite plenty of microaggregation methods in the sense of reducing the information loss and/or computational complexity, machine learning (ML) models using the resulting aggregated data face the problem that they are not as effective as expected. Motivated by the fact that ML models can be heavily influenced by distorted training data (albeit slightly), we deliberate on the performance of microaggregation in terms of not only data privacy but also data utility. In this paper, we propose Util-MA, a new utility-embraced microaggregation framework for effective ML applications. Specifically, unlike prior studies that apply microaggregation techniques directly to raw data, we design a unified framework that can potentially enhance the data utility while preserving the $k$ -anonymity through preprocessing steps including dimensionality reduction and clustering. By using real-world datasets, we empirically demonstrate the superiority of Util-MA over benchmark microaggregation methods in terms of classification accuracy. Moreover, we investigate the importance of preprocessing by measuring key performance indicators (KPIs) of clustering; the clustering stage of Util-MA leads to high performance on the classification when the clustering results substantially coincide with the ground truth labels. We also establish a close relationship between the KPIs of clustering and the classification accuracies, which tends to be revealed when there is a gain of Util-MA over the benchmark method is observed. Our framework is microaggregation-model-agnostic; thus, underlying microaggregation models can be appropriately chosen according to one’s needs and ML tasks.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/e77b04dd81be4f1b81c652e52edf6882 Zobrazit plný text záznamu View record in DOAJ