Popis: |
We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques.We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes.We included 152 studies, 58 (38.2% [95% CI 30.8-46.1]) were diagnostic and 94 (61.8% [95% CI 53.9-69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3-91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8-90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4-87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5-19.9]) and random forest (n = 73/522, 14% [95% CI 11.3-17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4-96.3]).Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning-based prediction models.PROSPERO, CRD42019161764. |