Feature Selection Based on Shapley Additive Explanations on Metagenomic Data for Colorectal Cancer Diagnosis
Autor: | Tran Thanh Dien, Nguyen Thanh-Hai, Nguyen Thai-Nghe, Toan Bao Tran, Nhi Yen Kim Phan |
---|---|
Rok vydání: | 2021 |
Předmět: |
Data source
business.industry Computer science Feature selection Machine learning computer.software_genre Pearson product-moment correlation coefficient Set (abstract data type) symbols.namesake Human health Metagenomics symbols Artificial intelligence Personalized medicine High dimensionality business computer |
Zdroj: | Soft Computing: Biomedical and Related Applications ISBN: 9783030766191 |
DOI: | 10.1007/978-3-030-76620-7_6 |
Popis: | Personalized medicine is one of the hottest current approaches to take care of and improve human health. Scientists who participate in projects related to personalized medicine approaches usually consider metagenomic data as a valuable data source for developing and proposing methods for disease treatments. We usually face challenges for processing metagenomic data because of its high dimensionality and complexities. Numerous studies have attempted to find biomarkers which can be medical signs related significantly to the diseases. In this study, we propose an approach based on Shapley Additive Explanations, a model explainability, to select valuable features from metagenomic data to improve the disease prediction tasks. The proposed feature selection method is evaluated on more than 500 samples of colorectal cancer coming from various geographic regions such as France, China, the United States, Austria, and Germany. The set of 10 selected features based on Shapley Additive Explanations can achieve significant results compared to the feature selection method based on the Pearson coefficient and it also obtains comparative performances compared to the original set of features including approximately 2000 features. |
Databáze: | OpenAIRE |
Externí odkaz: |