CAREx: context-aware read extension of paired-end sequencing data

Autor: Felix Kallenborn, Bertil Schmidt
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: BMC Bioinformatics, Vol 25, Iss 1, Pp 1-18 (2024)
Druh dokumentu: article
ISSN: 1471-2105
DOI: 10.1186/s12859-024-05802-w
Popis: Abstract Background Commonly used next generation sequencing machines typically produce large amounts of short reads of a few hundred base-pairs in length. However, many downstream applications would generally benefit from longer reads. Results We present CAREx—an algorithm for the generation of pseudo-long reads from paired-end short-read Illumina data based on the concept of repeatedly computing multiple-sequence-alignments to extend a read until its partner is found. Our performance evaluation on both simulated data and real data shows that CAREx is able to connect significantly more read pairs (up to $$99\%$$ 99 % for simulated data) and to produce more error-free pseudo-long reads than previous approaches. When used prior to assembly it can achieve superior de novo assembly results. Furthermore, the GPU-accelerated version of CAREx exhibits the fastest execution times among all tested tools. Conclusion CAREx is a new MSA-based algorithm and software for producing pseudo-long reads from paired-end short read data. It outperforms other state-of-the-art programs in terms of (i) percentage of connected read pairs, (ii) reduction of error rates of filled gaps, (iii) runtime, and (iv) downstream analysis using de novo assembly. CAREx is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at ( https://github.com/fkallen/CAREx ).
Databáze: Directory of Open Access Journals
Nepřihlášeným uživatelům se plný text nezobrazuje