DNA Methylation Prediction Using Reduced Features Obtained via Gappy Pair Kernel and Partial Least Square

Autor: Sajid Shah, Altaf Ur Rahman, Saima Jabeen, Ahmad Khan, Fiaz Gul Khan, Mohammed Elaffendi
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: IEEE Access, Vol 10, Pp 53265-53274 (2022)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2022.3174260
Popis: It is critical to correctly identify DNA methylation because it has been linked to a variety of human disorders, particularly cancer. DNA methylation is an epigenetic process that allows cells to alter gene expression. This work deals with a type of DNA methylation called 5-methyl cytosine (m5c), in which the methyl group ( $CH_{3}$ ) is attached to the $5^{th}$ carbon of cytosine. The performances of different machine learning algorithms used for methylation identification are greatly degraded due to poor representation of input sequential data. In the current work, we have proposed a classification model that is based on the extraction of high differentiating features from the sample sequences using gappy pair kernel. Increasing the number of features to better represent a sequence leads to the curse of dimensionality, which is handled by a dimensionality reduction technique called PLS (Partial Least Square). The obtained features are then subjected to multiple classifiers to test the discriminating power of these features. Results are computed for cross species i.e human and mouse, to check the robustness of our proposed model. Finally, the obtained results are compared in terms of sensitivity, specificity, and accuracy with the state-of-the-art approaches. Our proposed approach has outperformed state-of-the-art techniques in all three metrics for both datasets. For research community to test our technique, we have uploaded our code on github (https://github.com/sajidshahbs/gappypairKernel_Rcode).
Databáze: Directory of Open Access Journals