Applying Multi Interval Discretization Decision Tree to Data Mining ─ Gastro Esophageal Reflux Disease Example

Autor: Y. Y. Cai, 蔡依穎
Rok vydání: 2009
Druh dokumentu: 學位論文 ; thesis
Popis: 97
The trend that major hospitals in Taiwan apply data mining to analyze medical database and assist medical diagnosis enhances more and more. Among varieties of data mining methods, decision tree seems to be more acceptable because of its model interpretability. Applying binary split to continuous attributes easily lead decision tree to over depth, complex arity, over size, and low comprehensibility. Comparing with C4.5 and C5.0 algorithm, multi interval discretization methods have not only insignificant improvement but also no universal algorithms inside. This study applies simulated annealing algorithm to multi interval discretization of continuous attribute in decision tree in order to solve the problem of tree verboseness. Taking the gastro esophageal reflux disease database provided by one hospital in Taichung as sample, the result shows that simulated annealing based multi interval discretization is superior to C4.5 under three performance indicators: tree size, depth, and accuracy.
Databáze: Networked Digital Library of Theses & Dissertations