PathSys: Integrating pathway curation, profiling methods, and public repositories: An infrastructure for functional molecular data sharing

Autor: Kariotis, Sokratis
Rok vydání: 2017
DOI: 10.6084/m9.figshare.5212852.v1
Popis: Data integration at the level of high dimensional molecular interrogation is confounded by the diaspora of platforms and annotations of molecular events. To unify interpretation of functional activity within and between samples, we are developing a suite of tools that confer a highly standardised representation of pathway activity, networked pathway activity correlation, and pathway/disease/drug interaction. We have discovered that using the concept of higher order gene set interactions, using gene sets as the unit of comparison we are able to unify very large sets of data without a reliance on geneset overlap. Pathprint is the most developed of our set of tools: a functional approach that compares gene expression as a tertiary summary statistic for each canonical pathway, generating a set of pathway activities, networks and transcriptionally regulated targets. It compares a sample against a background of thousands of arrays to yield a relative activity for each pathway tested. It can be applied universally to gene expression profiles across species. Integration of large-scale profiling methods and curation of the public repository overcomes platform, species and batch effects to yield a standard measure of functional distance between experiments. Pathprint version (v2.0), shortly available through Bioconductor, includes 35 platforms, with new additions effectively increasing the number of covered arrays to 446,708; providing a 4x increase in background for pathway comparisons. Pathprint is utilised by the Harvard Stem Cell commons (http://stemcellcommons.org) as part of standardisation for representation and comparisons of stem cell systems. It is being implemented within the Genometranslationcommons (https://beta.genometranslationcommons.org//#/) at the University of Sheffield and the CureADCircuitscommons (in dev) as part of a Harvard/MIT/Sheffield consortium investigating regulation of genes associated with Alzheimer’s. PCxN (namely the Pathway Co-Expression Network) (Hide, Winston (2015): PCxN the Pathway co-activity Map. figshare. https://doi.org/10.6084/m9.figshare.1589792.v4) is an online web resource which allows the discovery of correlation relationships between groups of pathways or gene sets drawn from the MsigDB and Pathprint collections. The tool provides users the ability to explore a static extendable network by focusing on single pathways and their most correlated neighbours, as well as identifying relationships between groups of pathways shown to be enriched in the collections by gene set enrichment. Analyses can be viewed and exported through a heatmap, a correlation network and gene/network tables. PCxN is employed as part of the CureADCircuits consortium (publication in prep) and is deployed for interpretation of network and pathway relationships by the AMP-AD consortium. PDN (Pathway Drug Network), currently in development, relies on a network, made up of the expression correlation between each of 16,150 drug, disease and pathway gene signatures across 58,475 publicly available human microarrays (Affymetrix HGU133 Plus2) collected from the Comparative Toxicogenomics Database, PharmGKB, GeneSigDB, Wikipathways, KEGG, Netpath, Reactome, and Connectivity Map. PDN aims to utilize pathway – drug relationships to identify drug leads and to prioritise pathways that can be targeted in relationships to disease profiles. Its prototype has been successfully used together with Pathprint at Harvard School of Public Health in (Joachim R., Altschuler G., Hutchinson J., Wong H., Hide W., Kobzik L.: Pathwaycentered Analysis of the Relative Resistance of Children to Sepsis Mortality, in preparation). We have shown that PDN has a substantially higher rate of positives (p Taken as a whole, these approaches provide the first standardised approach to representation of systems biology with significant new insight into the systems level interpretation of gene set activity and correlation between genesets.
Databáze: OpenAIRE