Development of machine learning algorithms for screening of pulmonary disease
Autor: | Infante, Christian (Christian F.) |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: | |
Druh dokumentu: | Diplomová práce |
Popis: | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 131-136). Pulmonary diseases are a leading cause of death worldwide. Much of their burden disproportionately affects the developing world. The MIT Mobile Technology Lab has developed a Mobile Kit which screens and diagnoses COPD and asthma. In this thesis, we analyze and further develop tools in this kit. All of the data for this thesis were collected as part of a large medical study with our partner, the Chest Research Foundation (CRF), in Pune, India. The data consisted of 325 patients (135 healthy, 76 asthma, 46 COPD, 29 allergic rhinitis, and 39 other). Among the asthma and COPD patients, 67 had allergic rhinitis. All patients were examined using a mobile diagnostic kit designed at MIT consisting of a mobile stethoscope, peak flow meter, and questionnaire. All patients were also examained using the convential gold standard pulmonary function testing (PFT) lab. The performance of our Mobile Kit platform was previously analyzed and presented in a prior Master's thesis. Building on our group's prior work, in this thesis we present three main contributions: 1) we have created a classifier for a new disease category, allergic rhinitis, which accounts for roughly half of all respiratory clinic patients; 2) we have explored and anlayzed the value of cough sounds as a diagnostic tools for pulmonary disease; and 3) we have analyzed data from a pulmonary function testing lab which were collected in parallel with our group's Mobile Diagnostic Kit, and have compared the performance. In the first section of this thesis, we created a classifier for allergic rhinitis diagnosis, using the same multi-layer classification structure as was used in our group's prior work. This integrated classifier demonstrated moderate performance with AUCs ranging from 0.87 to 0.90. As a second approach, a standalone classifier was also explored, which produced much better results, with an AUC of 0.96. Going forward, we plan to use an independent classifier as part of our diagnostics. In the second part of this thesis, we explored the value of cough sounds for pulmonary diagnosis. Various classifiers were created for the screening and diagnosis of pulmonary disease through the analysis of cough sounds. We first created a classifier for the detection of Wet and Dry coughs (which can indicate overall pulmonary health), which had a high classification performance but limited diagnostic value. We then explored the diagnostic value of specific physical features of the cough sounds, including kurtosis, variance, zero crossing irregularity, and rate of decay. the utility of these features were then analyzed both in isolation and integrated with other Mobile Kit tools. It was discovered that these cough sound features do have value as a simple diagnostic tool to distinguish between asthma and COPD, as well as basic pulmonary health; however, it was found that cough sounds alone provide less value than other diagnostic tools for providing disease-specific diagnosis. When integrated with the Mobile Kit tools, cough sounds only improved performance on lung sounds; otherwise, coughs did not have any added benefit. Given the ease of data collection, we demonstrated that cough sounds can play a role in simple disease screening for use with community health workers. For the third major part of this thesis, we did a thorough analysis of pulmonary function testing (PFT) data, which is the gold standard for pulmonary disease diagnosis. The PFT laboratory tools included spirometry, impulse oscillometry, body plethysmography, and lung gas diffusion (DLCO). We first explored a multi-layer classification structure. Using this structure, the PFT machines produced good results on each classification layer: Healthy vs. Unhealthy [AUC=0.90 (0.04)], Obstructive (Obs.) vs. Non-obstructive [AUC=0.95 (0.05)], Obs. AR vs. Obs. Non-AR [AUC=0.72 (0.10)], COPD + AR vs. Asthma + AR [AUC=0.95 (0.15)], COPD vs. Asthma [AUC=1.00 (0.04)], Non-Obs. AR vs. Non-Obs. Non-AR [AUC=0.92 (0.12)]. These results are only moderately better than the results yielded by our Mobile Diagnostic Kit: Healthy vs. Unhealthy [AUC=0.98 (0.02)], Obstructive (Obs.) vs. Non-obstructive [AUC=0.96 (0.04)], Obs. AR vs. Obs. Non-AR [AUC=0.90 (0.06)], COPD + AR vs. Asthma + AR [AUC=0.93 (0.09)], COPD vs. Asthma [AUC=1.00 (0.00)], Non-Obs. AR vs. Non-Obs. Non-AR [AUC=0.87 (0.12)]. Although these results are moderately good, the compounded error represents an unacceptable level of misclassification. As an alternative to the multi-layer classification structure, we explored the use of individual classifiers for each disease, which yielded much better results. For the PFT data, the individual classifiers produced the following results: asthma [AUC=0.96 (0.04)], COPD [AUC=0.99 (0.03)], and allergic rhinitis [AUC=0.74 (0.08)]. For the Mobile Kit data, the individual classifiers produced the following results: asthma [AUC=0.90 (0.05)], COPD [AUC=0.94 (0.05)], and allergic rhinitis [AUC=0.96 (0.03)]. In summary, building on our group's prior work, in this thesis we have expanded the capability of our Mobile Diagnostic Kit to include allergic rhinitis, as well as improved the diagnostic specificity to account for co-morbidities (asthma + AR, COPD + AR). Although our multi-layer classifier design has value in providing diagnostic insight and feedback to clinicians, we recommend that future versions of our Mobile Kit also include individual classifiers for specific disease categories (asthma, COPD, allergic rhinitis, asthma + AR, COPD + AR) in order to improve performance. by Christian Infante. M. Eng. |
Databáze: | Networked Digital Library of Theses & Dissertations |
Externí odkaz: |