Popis: |
As the cost of DNA sequencing is decreasing, personal genomic data is becoming more abundant. Genomic data is known to be very identifying; even a few genetic mutations can identify an individual. Therefore, leakage of genetic information and the associated metadata create privacy risks. While these risks are well-known, most of the basic methods are not privacy-aware. One of these fundamental methods is the Hidden Markov Models (HMMs), which are especially important for comparative genomics because genetic data are sequential in nature, e.g., DNA/RNA nucleotides sequences and protein residues. HMMs are used mainly for comparing and aligning DNA/RNA and protein sequences, such as viral genomes or gene sequences, whereby similar portions of sequences are identified and they are defined to be conserved throughout evolution, whereas non-matching portions of the sequences indicate a divergence. HMM-based inference of sequence alignment is therefore a vital component of sequence analysis. Here, we describe SHiMMer, Secure HMM evaluation method that can guarantee cryptographic security while HMMs are used for sequence comparison. We used simulated data for alignment of genomic sequences to demonstrate that SHiMMer can perform sequence alignment efficiently. We present the scaling of time/memory requirements with increasing numbers of alignment states and lengths of sequences. |