Popis: |
Introduction: Discriminating asthma and COPD among pharmaceutical dispensation databases where diagnosis is not mentioned is highly challenging but necessary for medico-economic analyses on respiratory diseases using these databases. Machine learning may allow this type of identification especially when big data are available. Objectives: The objective was to develop an algorithm identifying asthma and COPD without relying on coded diagnoses, which could subsequently be applied to dispensation databases where diagnoses are not available. Methods: The dataset consisted of 976,584 visits (307,976 patients) with a diagnosis of asthma or COPD associated with a treatment prescription from 2015 to July 2018. Practitioners were 2,500 general practitioners, and 70 office-based pulmonologists participating in a permanent longitudinal observatory of prescription in ambulatory medicine. A majority of patients and visits corresponded to a diagnosis of asthma (84.1% and 77.8%, respectively). We used 75% of the dataset to train the algorithm and 25% to evaluate its accuracy. Variables used to train the algorithm were those available in dispensation databases, i.e., age, gender, type/dosing/presentation of the prescribed drug, and date of the prescription. A supervised machine learning approach was tested. Results: 87.2% and 86.4% of the asthma and COPD patients were properly classified by the algorithm, respectively. Conclusion: Our algorithm has an accuracy of 86% to identify asthma and COPD patients. Deep learning will be used in order to increase its performance and a sensibility analysis will be performed using different sources of dataset. |