Towards Personal Data Anonymization for Social Messaging

Autor:	David Šmahel, Ondřej Sotolář, Jaromír Plhák
Rok vydání:	2021
Předmět:	Text corpus Information retrieval Data anonymization Recall Computer science Privacy policy 05 social sciences Supervised learning De-identification 02 engineering and technology 16. Peace & justice computer.software_genre Domain (software engineering) Named-entity recognition 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing 0509 other social sciences 050904 information & library sciences computer
Zdroj:	Text, Speech, and Dialogue ISBN: 9783030835262 TDS
DOI:	10.1007/978-3-030-83527-9_24
Popis:	We present a method for building text corpora for the supervised learning of text-to-text anonymization while maintaining a strict privacy policy. In our solution, personal data entities are detected, classified, and anonymized. We use available machine-learning methods, like named-entity recognition, and improve their performance by grouping multiple entities into larger units based on the theory of tabular data anonymization. Experimental results on annotated Czech Facebook Messenger conversations reveal that our solution has recall comparable to human annotators. On the other hand, precision is much lower because of the low efficiency of the named entity recognition in the domain of social messaging conversations. The resulting anonymized text is of high utility because of the replacement methods that produce natural text.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::dbd9c287b1fb2b9e60cda2884a1ed08f https://doi.org/10.1007/978-3-030-83527-9_24 Zobrazit plný text záznamu