GNNfam

Autor:	Anuj Godase, Md. Khaledur Rahman, Ariful Azad
Rok vydání:	2021
Předmět:	Dense graph Computer science Margin (machine learning) business.industry Deep learning Graph (abstract data type) Pairwise comparison Pattern recognition Pruning (decision trees) Artificial intelligence business Cluster analysis Clustering coefficient
Zdroj:	BCB
Popis:	We present GNNfam, a pipeline for predicting protein families from protein sequences. GNNfam aligns proteins using pairwise sequence aligner LAST, creates a sparse graph based on the alignment scores, and employs graph neural networks (GNNs) to predict protein families. Unlike alignment-free deep learning methods such as DeepFam, GNNfam can control the sparsity of the protein similarity graph to prune uninformative edges. We develop three pruning strategies to improve the prediction accuracy, convergence, and running time of the downstream graph neural networks. We also demonstrate that semi-supervised GNNs outperform traditional graph clustering-based methods by a large margin. When trained with three labeled sequence datasets from the SCOPe and COG databases, GNNfam achieves more than 90% test accuracy when predicting protein families and performs significantly better than clustering, embedding and other deep learning methods. GNNfam is available at https://github.com/HipGraph/GNNfam.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::c158318f08f370f2b7bd2f312cbfe81b https://doi.org/10.1145/3459930.3469538 Zobrazit plný text záznamu