Precision annotation of digital samples in NCBI's gene expression omnibus.

Autor: Hadley D; Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA., Pan J; Department of Neurosurgery, Stanford University School of Medicine, Stanford, California 94305, USA., El-Sayed O; University of Illinois College of Medicine, Chicago, Illinois 60612, USA., Aljabban J; Harvard Medical School Department of Immunology, Harvard University, Boston, Massachusetts 02115, USA., Aljabban I; Harvard Medical School Department of Immunology, Harvard University, Boston, Massachusetts 02115, USA., Azad TD; Department of Neurosurgery, Stanford University School of Medicine, Stanford, California 94305, USA., Hadied MO; Wayne State University School of Medicine, Detroit, Michigan 48201, USA., Raza S; Yale School of Medicine, Yale University, New Haven, Connecticut 06519, USA., Rayikanti BA; University of Vermont Medical Center, University of Vermont, Burlington, Vermont 05401, USA., Chen B; Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA., Paik H; Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA., Aran D; Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA., Spatz J; Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA., Himmelstein D; Program in Biological &Medical Informatics, University of California, San Francisco, CA 94158, USA., Panahiazar M; Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA., Bhattacharya S; Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA., Sirota M; Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA., Musen MA; Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California 94305, USA., Butte AJ; Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA.
Jazyk: angličtina
Zdroj: Scientific data [Sci Data] 2017 Sep 19; Vol. 4, pp. 170125. Date of Electronic Publication: 2017 Sep 19.
DOI: 10.1038/sdata.2017.125
Abstrakt: The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open 'big data' under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine.
Databáze: MEDLINE