Feature Selection Based on Shapley Additive Explanations on Metagenomic Data for Colorectal Cancer Diagnosis

Autor: Tran Thanh Dien, Nguyen Thanh-Hai, Nguyen Thai-Nghe, Toan Bao Tran, Nhi Yen Kim Phan
Rok vydání: 2021
Předmět:
Zdroj: Soft Computing: Biomedical and Related Applications ISBN: 9783030766191
DOI: 10.1007/978-3-030-76620-7_6
Popis: Personalized medicine is one of the hottest current approaches to take care of and improve human health. Scientists who participate in projects related to personalized medicine approaches usually consider metagenomic data as a valuable data source for developing and proposing methods for disease treatments. We usually face challenges for processing metagenomic data because of its high dimensionality and complexities. Numerous studies have attempted to find biomarkers which can be medical signs related significantly to the diseases. In this study, we propose an approach based on Shapley Additive Explanations, a model explainability, to select valuable features from metagenomic data to improve the disease prediction tasks. The proposed feature selection method is evaluated on more than 500 samples of colorectal cancer coming from various geographic regions such as France, China, the United States, Austria, and Germany. The set of 10 selected features based on Shapley Additive Explanations can achieve significant results compared to the feature selection method based on the Pearson coefficient and it also obtains comparative performances compared to the original set of features including approximately 2000 features.
Databáze: OpenAIRE