Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli.

Autor: Zulfiqar H; Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China., Sun ZJ; Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China., Huang QL; Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China., Yuan SS; Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China., Lv H; Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China., Dao FY; Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China., Lin H; Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China. Electronic address: hlin@uestc.edu.cn., Li YW; School of Information Science and Technology, Northeast Normal University, Changchun 130117, China; Key Laboratory of Intelligent Information Processing of Jilin Province, Northeast Normal University, Changchun 130117, China; Institute of Computational Biology, Northeast Normal University, Changchun 130117, China. Electronic address: liyw085@nenu.edu.cn.
Jazyk: angličtina
Zdroj: Methods (San Diego, Calif.) [Methods] 2022 Jul; Vol. 203, pp. 558-563. Date of Electronic Publication: 2021 Aug 02.
DOI: 10.1016/j.ymeth.2021.07.011
Abstrakt: N4-methylcytosine (4mC) is a type of DNA modification which could regulate several biological progressions such as transcription regulation, replication and gene expressions. Precisely recognizing 4mC sites in genomic sequences can provide specific knowledge about their genetic roles. This study aimed to develop a deep learning-based model to predict 4mC sites in the Escherichia coli. In the model, DNA sequences were encoded by word embedding technique 'word2vec'. The obtained features were inputted into 1-D convolutional neural network (CNN) to discriminate 4mC sites from non-4mC sites in Escherichia coli genome. The examination on independent dataset showed that our model could yield the overall accuracy of 0.861, which was about 4.3% higher than the existing model. To provide convenience to scholars, we provided the data and source code of the model which can be freely download from https://github.com/linDing-groups/Deep-4mCW2V.
(Copyright © 2021 Elsevier Inc. All rights reserved.)
Databáze: MEDLINE