An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes
Autor: | Bernard M. E. Moret, Mingfu Shao, Yu Lin |
---|---|
Předmět: |
Genome evolution
Theoretical computer science DCJ distance orthology assignment Genome Mice Genes Duplicate adjacency graph Genetics Preprocessor Animals Humans maximum cycle decomposition Molecular Biology Time complexity Integer programming Mathematics Basis (linear algebra) Models Genetic Genomics Programming Linear Biological Evolution Rats Computational Mathematics Exact algorithm ComputingMethodologies_PATTERNRECOGNITION Computational Theory and Mathematics Modeling and Simulation Edit distance Algorithms |
Popis: | Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this article, we propose an integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse, and rat genomes, where once again our method outperforms MSOAR. |
Databáze: | OpenAIRE |
Externí odkaz: |