A gene selection approach for classifying diseases based on microarray datasets
Autor: | Adel A. Sewissy, Taysir Hassan A. Soliman, Hisham AbdelLatif |
---|---|
Rok vydání: | 2010 |
Předmět: |
Computer science
business.industry Decision tree Pattern recognition Feature selection computer.file_format Machine learning computer.software_genre Support vector machine Naive Bayes classifier Statistical classification ComputingMethodologies_PATTERNRECOGNITION Artificial intelligence Cluster analysis business computer Selection (genetic algorithm) ID3 |
Zdroj: | 2010 2nd International Conference on Computer Technology and Development. |
DOI: | 10.1109/icctd.2010.5645975 |
Popis: | Gene Selection is very important problem in the classification of serious diseases in clinical information systems. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analysis. In the current work, a hybrid approach is presented in order to classify diseases, such as colon cancer, leukemia, and liver cancer, based on informative genes. This hybrid approach uses clustering (K-means) with statistical analysis (ANOVA) as a preprocessing step for gene selection and Support Vector Machines (SVM) to classify diseases related to microarray experiments. To compare the performance of the proposed methodology, two kinds of comparisons were achieved: 1) applying statistical analysis combined with clustering algorithm (K-means) as a preprocessing step and 2) comparing different classification algorithms: decision tree (ID3), naive bayes, adaptive naive bayes, and support vector machines. In case of combining clustering with statistical analysis, much better classification accuracy is given of 97% rather than without applying clustering in the preprocessing phase. In addition, SVM had proven better accuracy than decision trees, Naive Bayes, and Adaptive Naive Bayes classification. |
Databáze: | OpenAIRE |
Externí odkaz: |