Data-Driven Lexical Normalization for Medical Social Media

Autor: Anne Dirkson, Suzan Verberne, Abeed Sarker, Wessel Kraaij
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: Multimodal Technologies and Interaction, Vol 3, Iss 3, p 60 (2019)
Druh dokumentu: article
ISSN: 2414-4088
DOI: 10.3390/mti3030060
Popis: In the medical domain, user-generated social media text is increasingly used as a valuablecomplementary knowledge source to scientific medical literature. The extraction of this knowledge iscomplicated by colloquial language use and misspellings. However, lexical normalization of suchdata has not been addressed effectively. This paper presents a data-driven lexical normalizationpipeline with a novel spelling correction module for medical social media. Our method significantlyoutperforms state-of-the-art spelling correction methods and can detect mistakes with an F1 of 0.63despite extreme imbalance in the data. We also present the first corpus for spelling mistake detectionand correction in a medical patient forum.
Databáze: Directory of Open Access Journals