Additional file 1 of Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data

Autor: de Souza, Vladimir B. C., Jordan, Ben T., Tseng, Elizabeth, Nelson, Elizabeth A., Hirschi, Karen K., Sheynkman, Gloria, Robinson, Mark D.
Rok vydání: 2023
DOI: 10.6084/m9.figshare.22688604.v1
Popis: Additional file 1: Fig. S1. IGV screenshot of three representative BAM files of Iso-Seq reads aligned to the reference genome, in which supplementary alignments are hidden. Fig. S2. The precision-recall plot of DeepVariant (DV)-based pipelines on Iso-Seq data (PacBio lrRNA-seq), for each dataset (Jurkat or WTC-11), and separated by variant types (indels or SNPs). Fig. S3. Relationship between the proportion of N-cigar (i.e., intron-containing) reads and Iso-Seq read coverage (WTC-11 dataset). Fig. S4. Precision-recall plots of DeepVariant (DV)-based pipelines for variant calling from Iso-Seq data (PacBio lrRNA-seq), according to the proportion of intron-containing (N-cigar) reads (point sizes). Fig. S5. The precision-recall plot when using both pileup and full-alignment models of Clair3-based pipelines on Iso-Seq data (PacBio lrRNA-seq), for each dataset (Jurkat or WTC-11), and separated by variant types (indels or SNPs). Fig. S6. The precision-recall plot when using pileup-only model of Clair3-based pipelines on Iso-Seq data (PacBio lrRNA-seq), for each dataset (Jurkat or WTC-11), and separated by variant types (indels or SNPs). Fig. S7. The precision-recall plot of the SNCR+NanoCaller pipeline on Iso-Seq data (PacBio lrRNA-seq), for each dataset (Jurkat or WTC-11), and separated by variant types (indels or SNPs). Fig. S8. Variant calling performance on Nanopore lrRNA-seq data. Fig. S9. Variant calling performance on Illumina RNA-seq data. Table S1. Number of true indels and SNPs covered by Iso-Seq data, in each read coverage range used in the mini-benchmark, for Jurkat and WTC-11 datasets. Table S2. Performance measures (precision, recall, and F1 score) of the best tested pipelines (SNCR+flagCorrection+DeepVariant, Clair3-mix, and SNCR+GATK), for each dataset (Jurkat and WTC-11), separated by variant types (indels and SNPs), using different thresholds for minimum Iso-Seq read coverage (Min_coverage). Table S3. Number of true indels and SNPs covered by Nanopore lrRNA-seq data, in each read coverage range used in the mini-benchmark, for WTC-11 dataset. Table S4. Number of true indels and SNPs covered by Illumina RNANanopore lrRNA-seq data, in each read coverage range used in the mini-benchmark, for WTC-11 dataset.
Databáze: OpenAIRE