Autor: |
Beuf Kristof, Schrijver Joachim, Thas Olivier, Criekinge Wim, Irizarry Rafael A, Clement Lieven |
Jazyk: |
angličtina |
Rok vydání: |
2012 |
Předmět: |
|
Zdroj: |
BMC Bioinformatics, Vol 13, Iss 1, p 303 (2012) |
Druh dokumentu: |
article |
ISSN: |
1471-2105 |
DOI: |
10.1186/1471-2105-13-303 |
Popis: |
Abstract Background 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. Results We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. Conclusions Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|