Simrank: Rapid and sensitive general-purpose k-mer search tool

Autor:	Brodie Eoin L, Singh Navjeet NS, Alekseyenko Alexander V, Karaoz Ulas, Keller Keith, DeSantis Todd Z, Pei Zhiheng, Andersen Gary L, Larsen Niels
Jazyk:	angličtina
Rok vydání:	2011
Předmět:	Ecology QH540-549.5
Zdroj:	BMC Ecology, Vol 11, Iss 1, p 11 (2011)
Druh dokumentu:	article
ISSN:	1472-6785
DOI:	10.1186/1472-6785-11-11
Popis:	Abstract Background Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Results Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Conclusions Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/fc70757924234616a225a0ee18541c2f Zobrazit plný text záznamu Full text from SpringerLink View record in DOAJ