758. High-Throughput Mining of Electronic Medical Records Using Generalizable Autonomous Scripts

Autor: Gail J. Demmler-Harrison, Ryan H. Rochat
Rok vydání: 2019
Předmět:
Zdroj: Open Forum Infectious Diseases
ISSN: 2328-8957
DOI: 10.1093/ofid/ofz360.826
Popis: Background The electronic medical record (EMR) has become a modern compendium of health information, from broad clinical assessments down to an individual’s heart rate. The wealth of information in these EMRs hold promise for clinical discovery and hypothesis generation. Unfortunately, as these systems have become more robust, mining them for relevant clinical information is hindered by the overall data architecture, and often requires the expertise of a clinical informatician to extract relevant data. However, as the information presented to the clinician through the digital workspace is derived from the core EMR database, the format is well structured and can be mined using text recognition and parsing scripts. Methods Here we present a program which can parse output from Epic Hyperspace®, generating a relational database of clinical information. To facilitate ease of use, our protocol capitalizes on the familiarity of Microsoft Excel® as an intermediary for storing the raw output from the EMR, with data parsing and processing scripts written in SAS V9.4 (Cary, North Carolina). Results As a proof of concept, we extracted the diagnosis codes and standard laboratories for 190 patients seen in our Congenital Cytomegalovirus Clinic at Texas Children’s Hospital in Houston, Texas. Manual extraction of these data into Microsoft Excel® took 1 hour, and the scripts to parse the data took less than 5 seconds to run. Data from these patients included: 3800 ICD-10 codes (along with their metadata) and 33,000 individual laboratory values. In total, more than 850,000 characters were extracted from the EMR using this technique. Manual review of 10 randomly selected charts, found the data in perfect concordant with the EMR, a direct reflection of the fidelity of the parsing scripts. On average, an experienced user was able to enter three ICD-10 codes each minute, and six individual laboratory values per minute. At best, this same process would have taken at least 110 hours using a conventional chart review technique. Conclusion High-throughput data mining tools have the potential to improve the feasibility of studies dependent upon information stored in the EMR. When coupled with specific content knowledge, this approach can consolidate months of data collection into a day’s task. Disclosures All authors: No reported disclosures
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje