An efficient prototype method to identify and correct misspellings in clinical text

Autor: Yijun Shao, T. Elizabeth Workman, Qing Zeng-Treitler, Guy Divita
Jazyk: angličtina
Rok vydání: 2019
Předmět:
0301 basic medicine
Research Report
Medical Records Systems
Computerized

Computer science
Pathology
Surgical

lcsh:Medicine
Dictionaries as Topic
computer.software_genre
General Biochemistry
Genetics and Molecular Biology

03 medical and health sciences
Spelling analysis
0302 clinical medicine
Error analysis
Text messaging
False positive paradox
Humans
Word2vec
Word2Vec
030212 general & internal medicine
lcsh:Science (General)
lcsh:QH301-705.5
Language
Natural Language Processing
business.industry
lcsh:R
Reproducibility of Results
General Medicine
Emergency department
Clinical text
Unified Medical Language System
Spelling
Term (time)
Research Note
030104 developmental biology
lcsh:Biology (General)
Vocabulary
Controlled

Word embeddings
Edit distance
Artificial intelligence
business
computer
Spelling correction
Natural language processing
Algorithms
Medical Informatics
lcsh:Q1-390
Zdroj: BMC Research Notes
BMC Research Notes, Vol 12, Iss 1, Pp 1-5 (2019)
ISSN: 1756-0500
Popis: Objective Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications. Results In this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types. Electronic supplementary material The online version of this article (10.1186/s13104-019-4073-y) contains supplementary material, which is available to authorized users.
Databáze: OpenAIRE