Statistical Modeling of Short-Tandem Repeat Capillary Electrophoresis Profiles
Autor: | Slim Karkar, Desmond S. Lun, Lauren E. Alfonse, Catherine M. Grgicak |
---|---|
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
Artifact (error) Computer science Noise (signal processing) Statistical model Mixture model Signal Amplicon Size 03 medical and health sciences 030104 developmental biology 0302 clinical medicine Capillary electrophoresis Microsatellite 030216 legal & forensic medicine Biological system |
Zdroj: | BIBM |
Popis: | Objective: Interrogating multiple polymorphic Short Tandem Repeat (STR) locations by way of PCR and capillary electrophoresis (CE) is the chief technique by which laboratories determine whether an individual contributed their DNA to biological material retrieved from the environment. There is, theoretically, a substantial level of information contained within the CE signal, regarding the length and number of DNA fragments amplified. However, environmental samples are challenging to interpret because little is known regarding the quantity or quality of the DNA and the allele signal component is often obfuscated by PCR artifacts, known as stutter, and noise. Thus, developing a signal model that can effectively model the components of STR signal and does not rely on a priori knowledge of the quantity or quality of DNA, is warranted. Results: As such, we first develop a strategy wherein we quantity the quality of the profile by examining the degree to which the signal changes with amplicon size. Second, for different components of the signal, we develop models for each component, i.e., allele, the artifact stutter and noise, of the signal. By examining the out-of-sample prediction error we identify a model that can be effectively utilized for downstream interpretation. Significance: The model is selected using a large, diverse collection of profiles obtained using 144 distinct laboratory conditions and a large range of DNA template masses, which extend from a single copy of DNA to hundreds of copies. As a Gaussian mixture model, it can be readily applied to analyze complex DNA samples. |
Databáze: | OpenAIRE |
Externí odkaz: |