Appropriate medical data categorization for data mining classification techniques
Autor: | I-Nong Lee, Shang-Chih Liao |
---|---|
Rok vydání: | 2003 |
Předmět: |
Ordinal data
Male Heart Diseases Computer science Taiwan Information Storage and Retrieval Health Informatics computer.software_genre Machine learning Data type Naive Bayes classifier Health Information Management Humans Minimum description length Categorical variable General Nursing business.industry Decision Trees Bayes Theorem Linear discriminant analysis ComputingMethodologies_PATTERNRECOGNITION Categorization Databases as Topic Binary data Female Data mining Artificial intelligence Neural Networks Computer business computer Software |
Zdroj: | Medical informatics and the Internet in medicine. 27(1) |
ISSN: | 1463-9238 |
Popis: | Some data mining (DM) methods, or software tools, require normalized data, others rely on categorized data, and some can accommodate multiple data scales. Each DM technique has a specific background theory; therefore, different results are expected when applying multiple methods. The purpose of this study is to find the data format appropriate for each DM classification technique for wider applications, and efficiently to obtain trustworthy results. Considering the nature of medical data, categorical variables are sometimes useful for making decisions and can make it easier to extrapolate knowledge. In this study, three mathematical data categorization methods (Fusinter, minimum description length principle [MDLPC] and Chi-merge) were applied to accommodate five data mining classification techniques (statistics discriminant analysis, supervised classification with Neural Networks, Decision trees, Genetic supervised clustering and Bayesian classification [probability neural networks; PNN]) using a heart disease database with four types of data (continuous data, binary data, nominal data, and ordinal data). Compared with original or normalized data, data categorized by the MDLPC categorization method was found to perform better in most of the DM classification techniques used in this study. Categorical data is good for most DM classification techniques (e.g. classification of disease and non-disease groups) and is relatively easy to use for extracting medical knowledge. |
Databáze: | OpenAIRE |
Externí odkaz: |