A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-causing genes using type-2 fuzzy data fusion

Autor: Zeinab Maleki, Mehdi Joodaki, Nasser Ghadiri, Maryam Lotfi Shahreza
Jazyk: angličtina
Rok vydání: 2019
Předmět:
DOI: 10.1101/844159
Popis: Prediction and discovery of disease-causing genes are among the main missions of biology and medicine. In recent years, researchers have developed several methods based on gene/protein networks for the detection of causative genes. However, because of the presence of false positives in these networks, the results of these methods often lack accuracy and reliability. This problem can be solved by using multiple genomic sources to reduce noise in data. However, network integration can also affect the quality of the integrated network. In this paper, we present a method named RWRHN (random walk with restart on a heterogeneous network) with fuzzy fusion or RWRHN-FF. In this method, first, four gene-gene similarity networks are constructed based on different genomic sources and then integrated using the type-II fuzzy voter scheme. The resulting gene-gene network is then linked to a disease-disease similarity network, which itself is constructed by the integration of four sources, through a two-part disease-gene network. The product of this process is a reliable heterogeneous network, which is analyzed by the RWRHN algorithm. The results of the analysis with the leave-one-out cross-validation method show that RWRHN-FF outperforms both RWRHN and RWRH. The proposed method is used to predict new genes for prostate, breast, gastric and colon cancers. To reduce the algorithm run time, Apache Spark is used as a platform for parallel execution of the RWRHN algorithm on heterogeneous networks. In the test conducted on heterogeneous networks of different sizes, this solution results in faster convergence than other non-distributed modes of implementations.
Databáze: OpenAIRE