Analytic optimization of Plasmodium falciparum marker gene haplotype recovery from amplicon deep sequencing of complex mixtures.

Autor: Lapp Z; Duke Global Health Institute, Duke University, Durham, NC, USA., Freedman E; Division of Infectious Diseases, School of Medicine, Duke University, Durham, NC, USA., Huang K; Division of Infectious Diseases, School of Medicine, Duke University, Durham, NC, USA., Markwalter CF; Duke Global Health Institute, Duke University, Durham, NC, USA., Obala AA; School of Medicine, College of Health Sciences, Moi University, Eldoret, Kenya., Prudhomme-O'Meara W; Duke Global Health Institute, Duke University, Durham, NC, USA.; Division of Infectious Diseases, School of Medicine, Duke University, Durham, NC, USA., Taylor SM; Duke Global Health Institute, Duke University, Durham, NC, USA.; Division of Infectious Diseases, School of Medicine, Duke University, Durham, NC, USA.
Jazyk: angličtina
Zdroj: MedRxiv : the preprint server for health sciences [medRxiv] 2023 Aug 23. Date of Electronic Publication: 2023 Aug 23.
DOI: 10.1101/2023.08.17.23294237
Abstrakt: Molecular epidemiologic studies of malaria parasites commonly employ amplicon deep sequencing (AmpSeq) of marker genes derived from dried blood spots (DBS) to answer public health questions related to topics such as transmission and drug resistance. As these methods are increasingly employed to inform direct public health action, it is important to rigorously evaluate the risk of false positive and false negative haplotypes derived from clinically-relevant sample types. We performed a control experiment evaluating haplotype recovery from AmpSeq of 5 marker genes ( ama1 , csp , msp7 , sera2 , and trap ) from DBS containing mixtures of DNA from 1 to 10 known P. falciparum reference strains across 3 parasite densities in triplicate (n=270 samples). While false positive haplotypes were present across all parasite densities and mixtures, we optimized censoring criteria to remove 83% (148/179) of false positives while removing only 8% (67/859) of true positives. Post-censoring, the median pairwise Jaccard distance between replicates was 0.83. We failed to recover 35% (477/1365) of haplotypes expected to be present in the sample. Haplotypes were more likely to be missed in low-density samples with <1.5 genomes/μL (OR: 3.88, CI: 1.82-8.27, vs. high-density samples with ≥75 genomes/μL) and in samples with lower read depth (OR per 10,000 reads: 0.61, CI: 0.54-0.69). Furthermore, minority haplotypes within a sample were more likely to be missed than dominant haplotypes (OR per 0.01 increase in proportion: 0.96, CI: 0.96-0.97). Finally, in clinical samples the percent concordance across markers for multiplicity of infection ranged from 40%-80%. Taken together, our observations indicate that, with sufficient read depth, haplotypes can be successfully recovered from DBS while limiting the false positive rate.
Databáze: MEDLINE