Popis: |
The integration of microarray technologies and machine learning methods has become popular in predicting pathological condition of diseases and discovering risk genes. The traditional microarray analysis considers pathways as simple gene sets, treating all genes in the pathway identically while ignoring the pathway network’s structure information. This study, however, proposed an entropy-based directed random walk (e-DRW) method to infer pathway activity. This study aims (1) To enhance the gene-weighting method in Directed Random Walk (DRW) by incorporating t-test statistic scores and correlation coefficient values, (2) To implement entropy as a parameter variable for random walking in a biological network, and (3) To apply Entropy Weight Method (EWM) in DRW pathway activity inference. To test the objectives, the gene expression dataset was used as input datasets while the pathway dataset was used as reference datasets to build a directed graph. An equation was proposed to assess the connectivity of nodes in the directed graph via probability values calculated from the Shannon entropy formula. A direct proof of calculation based on the proposed mathematical formula was presented using e-DRW with gene expression data. Based on the results, there was an improvement in terms of sensitivity of prediction and accuracy of cancer classification between e-DRW and conventional DRW. The within-dataset experiments indicated that our novel method demonstrated robust and superior performance in terms of accuracy and number of predicted risk-active pathways compared to the other DRW methods. In conclusion, the results revealed that e-DRW not only improved prediction performance, but also effectively extracted topologically important pathways and genes that are specifically related to the corresponding cancer types. |