VISTA: an integrated framework for structural variant discovery.

Autor: Sarwal V; Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, United States., Lee S; Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, United States., Yang J; Department of Quantitative and Computational Biology, Dana and David Dornsife College of Letters, Arts and Sciences University of Southern California, 3540 S Figueroa St, Los Angeles, California 90089, United States., Sankararaman S; Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, United States., Chaisson M; Department of Quantitative and Computational Biology, Dana and David Dornsife College of Letters, Arts and Sciences University of Southern California, 3540 S Figueroa St, Los Angeles, California 90089, United States., Eskin E; Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, United States., Mangul S; Department of Quantitative and Computational Biology, Dana and David Dornsife College of Letters, Arts and Sciences University of Southern California, 3540 S Figueroa St, Los Angeles, California 90089, United States.; Department of Clinical Pharmacy, Alfred E. Mann School of Pharmacy, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, United States.
Jazyk: angličtina
Zdroj: Briefings in bioinformatics [Brief Bioinform] 2024 Jul 25; Vol. 25 (5).
DOI: 10.1093/bib/bbae462
Abstrakt: Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn's disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.
(© The Author(s) 2024. Published by Oxford University Press.)
Databáze: MEDLINE