Systematic Biological Analysis of Tandem Repeats Sequences in Different Species based on Machine Learning

Autor: Chung-Yi Kuo, 郭仲翊
Rok vydání: 2019
Druh dokumentu: 學位論文 ; thesis
Popis: 107
Tandem Repeats Sequences are often used in Genetics, and the most well-known use is as a molecular genetic marker studies, which typically exhibit high sequence variability between populations and individuals, and having codominance. Therefore, it is also widely used in genetic diversity analysis. Genetics and Evolution are closely related. Species are all evolved from common descent. It means that species''s genetic information, such as tandem repeat, may contain the genetic information about the ancestors. At the same time, the classification criteria for species classification can represent the characteristics of the common descent of the same class of organism, and this property should also exist in tandem repeats sequences. Therefore, this study analyzes tandem repeats and species classifications, and hopes to find the association between tandem repeats and evolution. The data set used in this study is the genomic data of the Complete and Chromosome that has been sequenced and completed in the NCBI Genome database. According to the classification system of taxonomy, genomic data of 80 different species in 12 different phylum were selected. After finding the model of all tandem repeats by using the tool for finding repeated sequences, the two series of feature selection methods are used to select the representative and representative tandem repeat model. Finally, using the machine learning algorithm C4.5 and CART to build a classification model to explore the feasibility of tandem repeats as species classification.
Databáze: Networked Digital Library of Theses & Dissertations