Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning

Autor: Geoffrey D. Gordon, Andrew O. Westfall, James H. Willig, John D. Osborne, Matthew C. Wyatt, Steven Bethard
Rok vydání: 2016
Předmět:
genetic structures
020205 medical informatics
Computer science
health care facilities
manpower
and services

education
Health Informatics
02 engineering and technology
Research and Applications
computer.software_genre
Machine learning
Machine Learning
03 medical and health sciences
0302 clinical medicine
International Classification of Diseases
health services administration
Neoplasms
0202 electrical engineering
electronic engineering
information engineering

Data Mining
Electronic Health Records
Humans
030212 general & internal medicine
Natural Language Processing
Pathology
Clinical

business.industry
Mandatory Reporting
United States
Cancer registry
Identification (information)
Information extraction
Feature (computer vision)
Electronic data
Artificial intelligence
Diagnosis code
User interface
business
Precision and recall
computer
geographic locations
Natural language processing
Zdroj: Journal of the American Medical Informatics Association. 23:1077-1084
ISSN: 1527-974X
1067-5027
Popis: Objective To help cancer registrars efficiently and accurately identify reportable cancer cases. Material and Methods The Cancer Registry Control Panel (CRCP) was developed to detect mentions of reportable cancer cases using a pipeline built on the Unstructured Information Management Architecture – Asynchronous Scaleout (UIMA-AS) architecture containing the National Library of Medicine’s UIMA MetaMap annotator as well as a variety of rule-based UIMA annotators that primarily act to filter out concepts referring to nonreportable cancers. CRCP inspects pathology reports nightly to identify pathology records containing relevant cancer concepts and combines this with diagnosis codes from the Clinical Electronic Data Warehouse to identify candidate cancer patients using supervised machine learning. Cancer mentions are highlighted in all candidate clinical notes and then sorted in CRCP’s web interface for faster validation by cancer registrars. Results CRCP achieved an accuracy of 0.872 and detected reportable cancer cases with a precision of 0.843 and a recall of 0.848. CRCP increases throughput by 22.6% over a baseline (manual review) pathology report inspection system while achieving a higher precision and recall. Depending on registrar time constraints, CRCP can increase recall to 0.939 at the expense of precision by incorporating a data source information feature. Conclusion CRCP demonstrates accurate results when applying natural language processing features to the problem of detecting patients with cases of reportable cancer from clinical notes. We show that implementing only a portion of cancer reporting rules in the form of regular expressions is sufficient to increase the precision, recall, and speed of the detection of reportable cancer cases when combined with off-the-shelf information extraction software and machine learning.
Databáze: OpenAIRE