Large-scale sequence comparisons with sourmash .

Autor: Pierce NT; Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA., Irber L; Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA., Reiter T; Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA., Brooks P; Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA., Brown CT; Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA.
Jazyk: angličtina
Zdroj: F1000Research [F1000Res] 2019 Jul 04; Vol. 8, pp. 1006. Date of Electronic Publication: 2019 Jul 04 (Print Publication: 2019).
DOI: 10.12688/f1000research.19675.1
Abstrakt: The sourmash software package uses MinHash-based sketching to create "signatures", compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.
Competing Interests: No competing interests were disclosed.
Databáze: MEDLINE