Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches.
Autor: | Alosaimi S; Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa., van Biljon N; Department of Statistical Sciences, University of Cape Town, Cape Town, South Africa., Awany D; Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa., Thami PK; Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa., Defo J; Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa., Mugo JW; Faculty of Health Sciences, Division of Computational Biology, Department of Biomedical Sciences, University of Cape Town, Cape Town, South Africa., Bope CD; Faculty of Sciences, Department of Mathematics and Computer Science, University of Kinshasa, Kinshasa, DRC., Mazandu GK; Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa.; Faculty of Health Sciences, Division of Computational Biology, Department of Biomedical Sciences, University of Cape Town, Cape Town, South Africa., Mulder NJ; Faculty of Health Sciences, Division of Computational Biology, Department of Biomedical Sciences, University of Cape Town, Cape Town, South Africa.; Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa., Chimusa ER; Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa.; Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa. |
---|---|
Jazyk: | angličtina |
Zdroj: | Briefings in bioinformatics [Brief Bioinform] 2021 Jul 20; Vol. 22 (4). |
DOI: | 10.1093/bib/bbaa366 |
Abstrakt: | Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium. (© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.) |
Databáze: | MEDLINE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |