Popis: |
A lossless compression algorithm, for genetic sequences, based on searching for exact palindromes is reported. The compression results obtained in the algorithm show that the exact palindromes are one of the main hidden regularities in DNA sequences. The proposed DNA sequence compression algorithm is based on genetic palindrome substring and creates online Library file acting as a Look Up Table. The genetic palindrome substring is replaced by corresponding ASCII character starting from 33(!). This substring length depends on user. Information security is the most challenging question to protect the data from unauthorized user. It can provide the data security, by using ASCII code and on line Library file acting as a signature. This algorithm is tested on benchmark DNA sequences, also on the reverse, the complement and the reverse complement benchmark DNA sequences, and on artificial DNA sequences. The algorithm can approach a compression rate of 3.851273 bit/base. |