Automatically pre-screening patients for the rare disease aromatic l-amino acid decarboxylase deficiency using knowledge engineering, natural language processing, and machine learning on a large EHR population.
Autor: | Cohen AM; Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, Portland, OR 97239, United States., Kaner J; Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, Portland, OR 97239, United States., Miller R; PTC Therapeutics, South Plainfield, NJ 07080, United States., Kopesky JW; PTC Therapeutics, South Plainfield, NJ 07080, United States., Hersh W; Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, Portland, OR 97239, United States. |
---|---|
Jazyk: | angličtina |
Zdroj: | Journal of the American Medical Informatics Association : JAMIA [J Am Med Inform Assoc] 2024 Feb 16; Vol. 31 (3), pp. 692-704. |
DOI: | 10.1093/jamia/ocad244 |
Abstrakt: | Objectives: Electronic health record (EHR) data may facilitate the identification of rare diseases in patients, such as aromatic l-amino acid decarboxylase deficiency (AADCd), an autosomal recessive disease caused by pathogenic variants in the dopa decarboxylase gene. Deficiency of the AADC enzyme results in combined severe reductions in monoamine neurotransmitters: dopamine, serotonin, epinephrine, and norepinephrine. This leads to widespread neurological complications affecting motor, behavioral, and autonomic function. The goal of this study was to use EHR data to identify previously undiagnosed patients who may have AADCd without available training cases for the disease. Materials and Methods: A multiple symptom and related disease annotated dataset was created and used to train individual concept classifiers on annotated sentence data. A multistep algorithm was then used to combine concept predictions into a single patient rank value. Results: Using an 8000-patient dataset that the algorithms had not seen before ranking, the top and bottom 200 ranked patients were manually reviewed for clinical indications of performing an AADCd diagnostic screening test. The top-ranked patients were 22.5% positively assessed for diagnostic screening, with 0% for the bottom-ranked patients. This result is statistically significant at P < .0001. Conclusion: This work validates the approach that large-scale rare-disease screening can be accomplished by combining predictions for relevant individual symptoms and related conditions which are much more common and for which training data is easier to create. (© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association.) |
Databáze: | MEDLINE |
Externí odkaz: |