A methodology based on Trace-based clustering for patient phenotyping
Autor: | Antonio Lopez-Martinez-Carrasco, Jose M. Juarez, Bernardo Canovas-Segura, Manuel Campos |
---|---|
Rok vydání: | 2021 |
Předmět: |
Information Systems and Management
Computer science business.industry Machine learning computer.software_genre Management Information Systems Set (abstract data type) Consistency (database systems) Identification (information) Sørensen–Dice coefficient Knowledge extraction Artificial Intelligence Unsupervised learning Artificial intelligence business Cluster analysis computer Software Selection (genetic algorithm) |
Zdroj: | Knowledge-Based Systems. 232:107469 |
ISSN: | 0950-7051 |
Popis: | Background: The current situation of critical progression as regards the resistance of bacteria to antibiotics has led to the use of machine learning techniques in order to provide clinicians with new knowledge for decision making. One of the key aspects is precision medicine, which focuses on finding phenotypes of patients for whom treatments may be more effective or detecting high risk patients whose progress must be closely monitored. The identification of these phenotypes requires the application of a methodology whose results are consistent and interpretable, along with the control of the process by a clinical expert. Studies concerning machine learning phenotyping use conventional clustering or subgroup algorithms that require information to be obtained a priori. Methods: We propose a new unsupervised machine learning technique, denominated as Trace-based clustering, and a 5-step methodology in order to support clinicians when identifying patient phenotypes. The steps proposed are: (1) Extraction and transformation of data and analysis of clustering tendency, (2) Selection of clustering algorithm and parameters, (3) Automatic generation of candidate clusters, (4) Visual support for selection of candidate clusters, and (5) Evaluation by clinical experts. Experiments and Results: We undertake an antimicrobial resistance use case by employing the MIMIC-III open-access database for patients infected with the Methicillin-resistant Staphylococcus Aereus and Enterococcus Faecium treated with Vancomycin. The experiments were carried out using the Hopkins statistic in order to evaluate the clustering tendency of the data, the K-Means algorithm for clustering, and the Dice coefficient to measure the similarity of the clusters. Our experiments computed 370 potential patient sets (clusters) so as to obtain 19 candidate clusters for their final evaluation. We evaluated the final result with a classification model in order to ensure the consistency of the phenotypes obtained and we compared the result with a traditional clustering approach. We found a reduced set of consistent candidate clusters with a common phenotype (resistance and death), which were different from the other candidate clusters. An expert in the domain could add labels with clinical meaning to the reduced number of clusters. Conclusions: We show that the proposed methodology allows physicians to identify consistent patient phenotypes. Our experiments confirm that quality measures, and the visual analysis could help expert clinicians to control the knowledge discovery process and obtain interpretable results. Our approach provides a new perspective: that of finding patient sets using clustering techniques evaluated by overlapping clusters of the previous partitions. The method proposed is general and can be easily adapted to any other problem and any other clinical settings. |
Databáze: | OpenAIRE |
Externí odkaz: |