Associating Genes and Protein Complexes with Disease via Network Propagation

Autor: Tomer Shlomi, Oded Magger, Roded Sharan, Oron Vanunu, Eytan Ruppin
Rok vydání: 2010
Předmět:
Male
Association (object-oriented programming)
Inference
Disease
Computational biology
Biology
Protein–protein interaction
Cellular and Molecular Neuroscience
Diabetes mellitus genetics
Alzheimer Disease
Databases
Genetic

Protein Interaction Mapping
Diabetes Mellitus
Genetics
Humans
Diabetes and Endocrinology/Type 2 Diabetes
lcsh:QH301-705.5
Molecular Biology
Gene
Genetics and Genomics/Genetics of Disease
Ecology
Evolution
Behavior and Systematics

Oncology/Prostate Cancer
Computational Biology/Systems Biology
Ecology
Prostatic Neoplasms
Proteins
Reproducibility of Results
Genetics and Genomics/Bioinformatics
Genes
Genetics and Genomics/Disease Models
lcsh:Biology (General)
Computational Theory and Mathematics
Multiprotein Complexes
Modeling and Simulation
Identification (biology)
Neurological Disorders/Alzheimer Disease
Algorithms
Function (biology)
Research Article
Computational Biology/Genomics
Zdroj: PLoS Computational Biology, Vol 6, Iss 1, p e1000641 (2010)
PLoS Computational Biology
ISSN: 1553-7358
DOI: 10.1371/journal.pcbi.1000641
Popis: A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE's predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation.
Author Summary Understanding the genetic background of diseases is crucial to medical research, with implications in diagnosis, treatment and drug development. As molecular approaches to this challenge are time consuming and costly, computational approaches offer an efficient alternative. Such approaches aim at prioritizing genes in a genomic interval of interest according to their predicted strength-of-association with a given disease. State-of-the-art prioritization problems are based on the observation that genes causing similar diseases tend to lie close to one another in a network of protein-protein interactions. Here we develop a novel prioritization approach that uses the network data in a global manner and can tie not only single genes but also whole protein machineries with a given disease. Our method, PRINCE, is shown to outperform previous methods in both the gene prioritization task and the protein complex task. Applying PRINCE to prostate cancer, alzheimer's disease and type 2 diabetes, we are able to infer new causal genes and related protein complexes with high confidence.
Databáze: OpenAIRE