Popis: |
Recent technological developments, such as high-throughput sequencing, have enabled the sequencing of the genomes of many living organisms. Recently, it has also become possible to extract and sequence DNA from extinct organisms. In comparison with modern DNA, the computational analysis of ancient DNA is complicated by the fact that the sequenced fragments tend to be short, degraded and contaminated with extraneous environmental sequences, such as bacteria and modern human DNA. Identification of endogenous sequences from this mix of DNA is generally achieved by alignment to a reference genome sequence. However, existing alignment software does not work well with these ultra-short, chemically damaged sequences. In order to deal with these much older samples, a new software program has been implemented (R-Candy; U. Stenzel unpubl.)which aims to align these ultra-short reads and cope with the high levels of chemical damage present, using self-index data structures for pattern matching based on a Burrows-Wheeler Transform based FM-Index. This thesis evaluates the accuracy and performance of the R-Candy aligner using simulated ancient DNA sequences. R-Candy is compared to BWA, which is currently the most-commonly used aligner for ancient DNA. Tests on simulated data showed that R-Candy outperforms BWA (run using default and customized parameters), correctly aligning more endogenous reads correctly even in the presence of extensive deamination, as well as incorrectly aligning fewer exogenous reads. Future development of R-Candy will focus on increasing its speed by improving the search algorithm and adding support for multi-threading. |