Statistical Modeling of Short-Tandem Repeat Capillary Electrophoresis Profiles

Autor: Slim Karkar, Desmond S. Lun, Lauren E. Alfonse, Catherine M. Grgicak
Rok vydání: 2018
Předmět:
Zdroj: BIBM
Popis: Objective: Interrogating multiple polymorphic Short Tandem Repeat (STR) locations by way of PCR and capillary electrophoresis (CE) is the chief technique by which laboratories determine whether an individual contributed their DNA to biological material retrieved from the environment. There is, theoretically, a substantial level of information contained within the CE signal, regarding the length and number of DNA fragments amplified. However, environmental samples are challenging to interpret because little is known regarding the quantity or quality of the DNA and the allele signal component is often obfuscated by PCR artifacts, known as stutter, and noise. Thus, developing a signal model that can effectively model the components of STR signal and does not rely on a priori knowledge of the quantity or quality of DNA, is warranted. Results: As such, we first develop a strategy wherein we quantity the quality of the profile by examining the degree to which the signal changes with amplicon size. Second, for different components of the signal, we develop models for each component, i.e., allele, the artifact stutter and noise, of the signal. By examining the out-of-sample prediction error we identify a model that can be effectively utilized for downstream interpretation. Significance: The model is selected using a large, diverse collection of profiles obtained using 144 distinct laboratory conditions and a large range of DNA template masses, which extend from a single copy of DNA to hundreds of copies. As a Gaussian mixture model, it can be readily applied to analyze complex DNA samples.
Databáze: OpenAIRE