Heuristiky pro kompresi špatně formovaného XML

Autor: Szabó, Mária
Jazyk: čeština
Rok vydání: 2008
Druh dokumentu: masterThesis
Popis: XBW [9] is a modular application for lossless text compression, which enables to use several compression algorithms. The best results were reached with the combination of XML parser and Burrows-Wheeler transformation. Thus XBW stands for merged shortcuts XML and BWT. Therefore we try to improve the results in combination with BWT in the thesis. On les with size about 20MB, generated from hundreds of concatenated webpages, we achieve 37 % faster compression time at the cost of 5% worse compression ratio. However, this compression ratio is by 38% better when it comes to confrontation with Rar software. This acceleration was reached by a new type of parser based on dictionaries of tags and elements. Thesis contains also a new, completely rewritten, implementation of original parser, based on the same principle of tag and attribute dictionaries. With this reimplemetation we improved the average compression speed by 4% and average compression ratio by 2%.
Databáze: Networked Digital Library of Theses & Dissertations