iDNA-MS: An Integrated Computational Tool for Detecting DNA Modification Sites in Multiple Genomes
Autor: | Zheng-Xing Guan, Fu-Ying Dao, Hui Ding, Wei Su, Wei Chen, Hao Lin, Hao Lv, Meng-Lu Liu, Dan Zhang, Hui Yang |
---|---|
Rok vydání: | 2020 |
Předmět: |
0301 basic medicine
Scheme (programming language) Bioinformatics Computer science Generalization 02 engineering and technology Computational biology computer.software_genre Genome Article Quantitative Genetics 03 medical and health sciences DNA Modification Component (UML) Genetics lcsh:Science computer.programming_language Multidisciplinary Construct (python library) 021001 nanoscience & nanotechnology Random forest Identification (information) 030104 developmental biology lcsh:Q Data mining 0210 nano-technology computer |
Zdroj: | iScience iScience, Vol 23, Iss 4, Pp-(2020) |
ISSN: | 1556-5068 |
Popis: | Summary 5hmC, 6mA, and 4mC are three common DNA modifications and are involved in various of biological processes. Accurate genome-wide identification of these sites is invaluable for better understanding their biological functions. Owing to the labor-intensive and expensive nature of experimental methods, it is urgent to develop computational methods for the genome-wide detection of these sites. Keeping this in mind, the current study was devoted to construct a computational method to identify 5hmC, 6mA, and 4mC. We initially used K-tuple nucleotide component, nucleotide chemical property and nucleotide frequency, and mono-nucleotide binary encoding scheme to formulate samples. Subsequently, random forest was utilized to identify 5hmC, 6mA, and 4mC sites. Cross-validated results showed that the proposed method could produce the excellent generalization ability in the identification of the three modification sites. Based on the proposed model, a web-server called iDNA-MS was established and is freely accessible at http://lin-group.cn/server/iDNA-MS. Graphical Abstract Highlights • A computational tool was developed for identification of 5hmC, 6mA, and 4mC • 6mA and 4mC mark similar regions in the C. equisetifolia and F. vesca genomes • 5hmC enriches in the initial and middle of the DNA loops • A user-friendly webserver was available at http://lin-group.cn/server/iDNA-MS Genetics; Quantitative Genetics; Bioinformatics |
Databáze: | OpenAIRE |
Externí odkaz: |