Tximeta: Reference sequence checksums for provenance identification in RNA-seq.

Autor: Love MI; Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina, United States of America.; Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina, United States of America., Soneson C; Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.; SIB Swiss Institute of Bioinformatics, Basel, Switzerland., Hickey PF; Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.; The Department of Medical Biology, University of Melbourne, Parkville, Victoria, Australia., Johnson LK; Department of Population Health and Reproduction, University of California, Davis, Davis, California, United States of America., Pierce NT; Department of Population Health and Reproduction, University of California, Davis, Davis, California, United States of America., Shepherd L; Roswell Park Comprehensive Cancer Center, Buffalo, New York, United States of America., Morgan M; Roswell Park Comprehensive Cancer Center, Buffalo, New York, United States of America., Patro R; Department of Computer Science, University of Maryland, College Park, Maryland, United States of America.
Jazyk: angličtina
Zdroj: PLoS computational biology [PLoS Comput Biol] 2020 Feb 25; Vol. 16 (2), pp. e1007664. Date of Electronic Publication: 2020 Feb 25 (Print Publication: 2020).
DOI: 10.1371/journal.pcbi.1007664
Abstrakt: Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.
Competing Interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: RP is a co-founder of Ocean Genomics.
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje