Popis: |
This thesis makes three contributions in the area of computing science. Our first contribution is the recognition that new data types produced by large-scale biological research techniques lead to a flood of data which creates new challenges in the areas of data indexing, integration, manipulation and visualisation. The second contribution is a new research methodology which combines orthogonal persistence with an empirical evaluation of disk-resident suffix indexes. This methodology allowed us to develop a practical algorithm for the construction of suffix trees on disk up to any size supported by the available file and addressing space, which has hitherto not been possible. The third contribution is a new experimental methodology for examining the usefulness of suffix indexes, and the use of this methodology in an empirical investigation of the indexing gain achieved by combining an approximate matching algorithm with a large suffix index. Those results are presented against the background of the changing technological landscape affecting life sciences and bioinformatics research and the resulting need for new computing solutions. |