Abstrakt: |
Fusion genes and transcripts can be biological markers as well as the reasons for tumor progression and development. Modern algorithms and high-throughput sequencing are the complementary clues to the question of the tumor origin and cancer detection as well as to the fundamental question of fusion genes origin and their influence on molecular processes of the cell. A wide range of algorithms for fusion genes detection was developed, with various differences in computing speed, sensitivity, specificity, and focus on the experimental design. Depending on the read length (50–300 bp—short reads, 5000–100000 bp—long reads), there are three main types of bioinformatic algorithms: those that focus on short-read sequencing or long-read sequencing exclusively and algorithms that combine the results of both short- and long-read sequencing. These algorithms are further subdivided into: 1) alignment-first approaches (STAR-Fusion, Arriba) that map reads to the genome or transcriptome directly and search the reads supporting the fusion gene or transcript; 2) assembly-first approaches (Fusion-Bloom) that assemble the genome or transcriptome from the overlapping reads de novo and then compare the results to the reference transcriptome or genome to find transcripts or genes not present in the reference and therefore raising questions; 3) pseudoalignment approaches that do not make local alignment, but just search for the closest transcript subsequence to the reads seed, following the precomputed index for all reference transcripts, and provide the results. This article describes the main classes of available software tools for fusion gene detection, provides the characteristics of these programs, their advantages and disadvantages. To date, assembly-first algorithms remain the most resource intensive and slowest. Mapping-first approaches are quite fast and rather accurate at fusion genes detection, yet the fastest and resource-saving are the pseudoalignment algorithms, however the speed of the search is increased at the expense of the search quality. [ABSTRACT FROM AUTHOR] |