Using Random Forest Models to Identify Correlates of a Diabetic Peripheral Neuropathy Diagnosis from Electronic Health Record Data
Autor: | Jack Mardekian, Alesia Sadosky, Sarah Dubrava, Bruce Parsons, E Jay Bienen, John D. Markman, Markay Hopps |
---|---|
Rok vydání: | 2016 |
Předmět: |
Adult
Male 0301 basic medicine medicine.medical_specialty Adolescent 030209 endocrinology & metabolism Type 2 diabetes Young Adult 03 medical and health sciences 0302 clinical medicine Diabetic Neuropathies Electronic health record Health care Statistics medicine Data Mining Electronic Health Records Humans Medical prescription Aged Retrospective Studies Receiver operating characteristic business.industry General Medicine Middle Aged medicine.disease Confidence interval Random forest 030104 developmental biology Anesthesiology and Pain Medicine Peripheral neuropathy Diabetes Mellitus Type 2 ROC Curve Emergency medicine Female Neurology (clinical) business |
Zdroj: | Pain Medicine. 18:107-115 |
ISSN: | 1526-4637 1526-2375 |
Popis: | Objective. To identify variables correlated with a diagnosis of diabetic peripheral neuropathy (DPN) using random forest modeling applied to electronic health records. Design. Retrospective analysis. Setting. Humedica de-identified electronic health records database. Subjects. Subjects ≥ 18 years old with type 2 diabetes from January 1, 2008–September 30, 2013 having continuous data for 1 year pre- and postindex with DPN (n = 35,050) and without DPN (n = 288,328) were identified. Methods. Demographic, clinical, and health care resource utilization variables (e.g., inpatient and outpatient encounters, medications, and procedures) were input into a random forest model to identify the most important correlates of a DPN diagnosis. Random forest modeling is a computationally extensive, robust data mining technique that accommodates large sets of variables to identify associated factors using an ensemble of classifications trees. Accuracy of the model was evaluated using receiver operating characteristic curves (ROC). Results. The final random forest model consisted of the following variables (importance) associated with a DPN diagnosis: Charlson Comorbidity Index score (100%), age (37.1%), number of pre-index procedures and services (29.7%), number of pre-index outpatient prescriptions (24.2%), number of pre-index outpatient visits (18.3%), number of pre-index laboratory visits (16.9%), number of pre-index outpatient office visits (12.1%), number of inpatient prescriptions (5.9%), and number of pain-related medication prescriptions (4.4%). ROC analysis confirmed model performance, with an area under the curve of 0.824 and accuracy of 89.6% (95% confidence interval 89.4%, 89.8%). Conclusions. Random forest modeling can determine likelihood of a DPN diagnosis. Further validation of the random forest model may help facilitate earlier diagnosis and enhance management strategies. |
Databáze: | OpenAIRE |
Externí odkaz: |