Autor: |
Mwenge Mulenga, Sameem Abdul Kareem, Aznul Qalid Md Sabri, Manjeevan Seera |
Jazyk: |
angličtina |
Rok vydání: |
2021 |
Předmět: |
|
Zdroj: |
IEEE Access, Vol 9, Pp 97296-97319 (2021) |
Druh dokumentu: |
article |
ISSN: |
2169-3536 |
DOI: |
10.1109/ACCESS.2021.3094529 |
Popis: |
Machine learning (ML)-based detection of diseases using sequence-based gut microbiome data has been of great interest within the artificial intelligence in medicine (AIM) community. The approach offers a non-invasive alternative for colorectal cancer detection, which is based on stool samples. Considering limitations of existing methods in CRC detection, medical research has shown interest in the use of high throughput data to identify the disease. Owing to several limitations of conventional ML algorithms, deep learning (DL) methods are becoming more popular due to their outstanding performance in related fields. However, the performance of DL methods is affected by limitations such as dimensionality, sparsity, and feature dominance inherent in microbiome data. This research proposes stacking and chaining of normalization methods to address the limitations. While the stacking technique offers a robust, easy to use, and interpretable alternative for augmenting microbiome and other tabular data, the chaining technique is an alternative to data normalization that dynamically adjusts the underlying properties of data towards the normal distribution. The proposed techniques are combined with rank transformation and feature selection to further improve the performance of the model, with area under the curve (AUC) values between 0.857 to 0.987 using publicly available datasets. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|