On Interestingness Measures for Mining Statistically Significant and Novel Clinical Associations from EMRs.

Autor: Abar O; Dept. of Computer Science, University of Kentucky, Lexington, KY., Charnigo RJ; Department of Biostatistics, University of Kentucky, Lexington, KY., Rayapati A; Department of Psychiatry, University of Kentucky, Lexington, KY., Kavuluru R; Div. of Biomedical Informatics, Dept. of Internal Medicine, Dept. of Computer Science, University of Kentucky, Lexington, KY.
Jazyk: angličtina
Zdroj: ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine [ACM BCB] 2016 Oct; Vol. 2016, pp. 587-594.
DOI: 10.1145/2975167.2985843
Abstrakt: Association rule mining has received significant attention from both the data mining and machine learning communities. While data mining researchers focus more on designing efficient algorithms to mine rules from large datasets, the learning community has explored applications of rule mining to classification. A major problem with rule mining algorithms is the explosion of rules even for moderate sized datasets making it very difficult for end users to identify both statistically significant and potentially novel rules that could lead to interesting new insights and hypotheses. Researchers have proposed many domain independent interestingness measures using which, one can rank the rules and potentially glean useful rules from the top ranked ones. However, these measures have not been fully explored for rule mining in clinical datasets owing to the relatively large sizes of the datasets often encountered in healthcare and also due to limited access to domain experts for review/analysis. In this paper, using an electronic medical record (EMR) dataset of diagnoses and medications from over three million patient visits to the University of Kentucky medical center and affiliated clinics, we conduct a thorough evaluation of dozens of interestingness measures proposed in data mining literature, including some new composite measures. Using cumulative relevance metrics from information retrieval, we compare these interestingness measures against human judgments obtained from a practicing psychiatrist for association rules involving the depressive disorders class as the consequent. Our results not only surface new interesting associations for depressive disorders but also indicate classes of interestingness measures that weight rule novelty and statistical strength in contrasting ways, offering new insights for end users in identifying interesting rules.
Databáze: MEDLINE