Autor: |
Anne Dirkson, Suzan Verberne, Abeed Sarker, Wessel Kraaij |
Jazyk: |
angličtina |
Rok vydání: |
2019 |
Předmět: |
|
Zdroj: |
Multimodal Technologies and Interaction, Vol 3, Iss 3, p 60 (2019) |
Druh dokumentu: |
article |
ISSN: |
2414-4088 |
DOI: |
10.3390/mti3030060 |
Popis: |
In the medical domain, user-generated social media text is increasingly used as a valuablecomplementary knowledge source to scientific medical literature. The extraction of this knowledge iscomplicated by colloquial language use and misspellings. However, lexical normalization of suchdata has not been addressed effectively. This paper presents a data-driven lexical normalizationpipeline with a novel spelling correction module for medical social media. Our method significantlyoutperforms state-of-the-art spelling correction methods and can detect mistakes with an F1 of 0.63despite extreme imbalance in the data. We also present the first corpus for spelling mistake detectionand correction in a medical patient forum. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|