An AI-assisted Approach for Checking the Completeness of Privacy Policies Against GDPR
Autor: | Mehrdad Sabetzadeh, Katrien Baetens, Peter Goes, Damiano Torre, Sylvie Forastier, Sallam Abualhaija, Lionel C. Briand |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science [C05] [Engineering
computing & technology] 021110 strategic defence & security studies Privacy Policies The General Data Protection Regulation (GDPR) Computer science Machine Learning (ML) Privacy policy Supervised learning 0211 other engineering and technologies 020207 software engineering Natural Language Processing (NLP) 02 engineering and technology Sciences informatiques [C05] [Ingénierie informatique & technologie] Computer security computer.software_genre Metadata Case Study Research Legal Compliance General Data Protection Regulation Case study research 0202 electrical engineering electronic engineering information engineering False positive paradox Data Protection Act 1998 Completeness (statistics) computer |
Zdroj: | RE |
DOI: | 10.1109/re48521.2020.00025 |
Popis: | Privacy policies are critical for helping individuals make informed decisions about their personal data. In Europe, privacy policies are subject to compliance with the General Data Protection Regulation (GDPR). If done entirely manually, checking whether a given privacy policy complies with GDPR is both time-consuming and error-prone. Automated support for this task is thus advantageous. At the moment, there is an evident lack of such support on the market. In this paper, we tackle an important dimension of GDPR compliance checking for privacy policies. Specifically, we provide automated support for checking whether the content of a given privacy policy is complete according to the provisions stipulated by GDPR. To do so, we present: (1) a conceptual model to characterize the information content envisaged by GDPR for privacy policies, (2) an AI-assisted approach for classifying the information content in GDPR privacy policies and subsequently checking how well the classified content meets the completeness criteria of interest; and (3) an evaluation of our approach through a case study over 24 unseen privacy policies. For classification, we leverage a combination of Natural Language Processing and supervised Machine Learning. Our experimental material is comprised of 234 real privacy policies from the fund industry. Our empirical results indicate that our approach detected 45 of the total of 47 incompleteness issues in the 24 privacy policies it was applied to. Over these policies, the approach had eight false positives. The approach thus has a precision of 85% and recall of 96% over our case study. |
Databáze: | OpenAIRE |
Externí odkaz: |