Exposing safe correlations in transactional datasets.

Autor: Chicha, Elie, Al Bouna, Bechara, Wünsche, Kay, Chbeir, Richard
Zdroj: Service Oriented Computing & Applications; Dec2021, Vol. 15 Issue 4, p289-307, 19p
Abstrakt: A particularly challenging problem for data anonymization is dealing with transactional data. Most anonymization methods assume homogeneous, independent and identically distributed (i.i.d.) data; "flattening" transactional data to satisfy this model results in wide, sparse data that does not anonymize well with traditional techniques. While there have been some approaches for generalization-based anonymization, bucketization techniques (e.g., anatomy) pose new challenges. In particular, bucketization provides the opportunity to learn correlations between data items, but also a risk of identifying individuals because of dependencies inferred from such correlations. We present a method that balances these issues, retaining the ability to discover correlations in the data, while hiding dependencies that would enable correlations to be used to link specific values to individuals. We introduce a correlation anonymization constraint that ensures correlations do not allow data to be linked to a specific individual, and an elastic safe grouping algorithm that meets this constraint while preserving data correlations. We evaluate the utility loss on a transactional rental dataset. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index