Mining comorbidity patterns using retrospective analysis of big collection of outpatient records
Autor: | Svetla Boytcheva, Galia Angelova, Zhivko Angelov, Dimitar Tcharaktchiev |
---|---|
Přispěvatelé: | Boytcheva, Svetla |
Rok vydání: | 2017 |
Předmět: |
business.industry
Schizophrenia (object-oriented programming) Research Natural language processing Context (language use) 02 engineering and technology Comorbidity medicine.disease Data science Health informatics 03 medical and health sciences Identification (information) 0302 clinical medicine Text mining 0202 electrical engineering electronic engineering information engineering Retrospective analysis Medicine 020201 artificial intelligence & image processing 030212 general & internal medicine business Maximal frequent patterns mining Data mining Reimbursement |
Popis: | Background Studying comorbidities of disorders is important for detection and prevention. For discovering frequent patterns of diseases we can use retrospective analysis of population data, by filtering events with common properties and similar significance. Most frequent pattern mining methods do not consider contextual information about extracted patterns. Further data mining developments might enable more efficient applications in specific tasks like comorbidities identification. Methods We propose a cascade data mining approach for frequent pattern mining enriched with context information, including a new algorithm MIxCO for maximal frequent patterns mining. Text mining tools extract entities from free text and deliver additional context attributes beyond the structured information about the patients. Results The proposed approach was tested using pseudonymised reimbursement requests (outpatient records) submitted to the Bulgarian National Health Insurance Fund in 2010–2016 for more than 5 million citizens yearly. Experiments were run on 3 data collections. Some known comorbidities of Schizophrenia, Hyperprolactinemia and Diabetes Mellitus Type 2 are confirmed; novel hypotheses about stable comorbidities are generated. The evaluation shows that MIxCO is efficient for big dense datasets. Conclusion Explicating maximal frequent itemsets enables to build hypotheses concerning the relationships between the exogeneous and endogeneous factors triggering the formation of these sets. MixCO will help to identify risk groups of patients with a predisposition to develop socially-significant disorders like diabetes. This will turn static archives like the Diabetes Register in Bulgaria to a powerful alerting and predictive framework. |
Databáze: | OpenAIRE |
Externí odkaz: |