A genetic algorithm approach for verification of the syllable-based text compression technique
Autor: | Göktürk Üçoluk, I. Hakki Toroslu |
---|---|
Rok vydání: | 1997 |
Předmět: |
Agglutinative language
Computer science 05 social sciences Testbed Process (computing) 02 engineering and technology Library and Information Sciences Huffman coding Set (abstract data type) symbols.namesake Compression (functional analysis) Genetic algorithm 0202 electrical engineering electronic engineering information engineering symbols 020201 artificial intelligence & image processing 0509 other social sciences Syllable 050904 information & library sciences Algorithm Information Systems |
Zdroj: | Journal of Information Science. 23:365-372 |
ISSN: | 1741-6485 0165-5515 |
DOI: | 10.1177/016555159702300503 |
Popis: | Provided that an easy mechanism exists for it, it is possible to decompose a text into strings that have lengths greater than one and occur frequently. Having in one hand the set of such frequently occurring strings and in the other the set of letters and symbols, it is possible to compress the text using Huffman coding over an alphabet which is a subset of the union of these two sets. Observations reveal that, in most cases, the maximal inclusion of the strings leads to an optimal length of compressed text. However, the verification of this prediction requires the consideration of all subsets in order to find the one that leads to the best compression. A genetic algorithm is devised and used for this search process. In Turkish texts, because of the agglutinative nature of the language, a highly regular syllable formation provides a useful testbed. |
Databáze: | OpenAIRE |
Externí odkaz: |