Bacteriophage classification for assembled contigs using graph convolutional network

Autor: Yanni Sun, Jiayu Shang, Jingzhe Jiang
Jazyk: angličtina
Rok vydání: 2021
Předmět:
FOS: Computer and information sciences
Statistics and Probability
Computer Science - Machine Learning
Source code
AcademicSubjects/SCI01060
Computer science
media_common.quotation_subject
Computational biology
Biochemistry
Convolutional neural network
DNA sequencing
Machine Learning (cs.LG)
Bacteriophage
03 medical and health sciences
Protein sequencing
Quantitative Biology - Genomics
Bioinformatics of Microbes and Microbiomes
Bacteriophages
Molecular Biology
030304 developmental biology
media_common
Genomics (q-bio.GN)
0303 health sciences
biology
Contig
030302 biochemistry & molecular biology
High-Throughput Nucleotide Sequencing
biology.organism_classification
Computer Science Applications
Computational Mathematics
Computational Theory and Mathematics
Metagenomics
FOS: Biological sciences
Graph (abstract data type)
Metagenome
Software
Zdroj: Bioinformatics
ISSN: 1367-4811
1367-4803
Popis: Motivation: Bacteriophages (aka phages), which mainly infect bacteria, play key roles in the biology of microbes. As the most abundant biological entities on the planet, the number of discovered phages is only the tip of the iceberg. Recently, many new phages have been revealed using high throughput sequencing, particularly metagenomic sequencing. Compared to the fast accumulation of phage-like sequences, there is a serious lag in taxonomic classification of phages. High diversity, abundance, and limited known phages pose great challenges for taxonomic analysis. In particular, alignment-based tools have difficulty in classifying fast accumulating contigs assembled from metagenomic data. Results: In this work, we present a novel semi-supervised learning model, named PhaGCN, to conduct taxonomic classification for phage contigs. In this learning model, we construct a knowledge graph by combining the DNA sequence features learned by convolutional neural network (CNN) and protein sequence similarity gained from gene-sharing network. Then we apply graph convolutional network (GCN) to utilize both the labeled and unlabeled samples in training to enhance the learning ability. We tested PhaGCN on both simulated and real sequencing data. The results clearly show that our method competes favorably against available phage classification tools.
Comment: 15 pages, 10 figures
Databáze: OpenAIRE