Popis: |
With organizations storing and even openly publishing their data for further processing, privacy becomes an issue. Such open data should retain its original structure while protecting sensitive personal data. Our aim was to develop fast and secure software for offline anonymization of (distributed) big data. Herein, we describe speed and security requirements for anonymization systems, popular techniques of anonymization and de-anonymization attacks. We give a detailed description of our software for in-situ anonymization of big data distributed in a cluster tested on a real Telco customer data record (CDR) dataset (dataset size is around 500 GB). |