Popis: |
Behavioral data, collected from our daily interactions with technology, have driven scientific advances. Yet, the collection and sharing of this data raise legitimate privacy concerns, as individuals can often be reidentified. Current identification attacks, however, require auxiliary information to roughly match the information available in the dataset, limiting their applicability. We here propose an entropy-based profiling model to learn time-persistent profiles. Using auxiliary information about a single target collected over a nonoverlapping time period, we show that individuals are correctly identified 79% of the time in a large location dataset of 0.5 million individuals and 65.2% for a grocery shopping dataset of 85,000 individuals. We further show that accuracy only slowly decreases over time and that the model is robust to state-of-the-art noise addition. Our results show that much more auxiliary information than previously believed can be used to identify individuals, challenging deidentification practices and what currently constitutes legally anonymous data. |