DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks

Autor: Maria Jesus Martin, Volkan Atalay, Ahmet Sureyya Rifaioglu, Rengul Cetin-Atalay, Tunca Doğan
Přispěvatelé: [Rifaioglu, Ahmet Sureyya, Atalay, Volkan] METU, Dept Comp Engn, TR-06800 Ankara, Turkey, [Rifaioglu, Ahmet Sureyya] Iskenderun Tech Univ, Dept Comp Engn, TR-31200 Antakya, Turkey, [Dogan, Tunca, Martin, Maria Jesus] EBI, European Mol Biol Lab, Cambridge CB10 1SD, England, Cetin-Atalay, Rengul, Atalay, Volkan] METU, Dept Hlth Informat, Grad Sch Informat, KanSiL, TR-06800 Ankara, Turkey, Dogan, Tunca -- 0000-0002-1298-9763, Cetin-Atalay, Rengul -- 0000-0003-2408-6606, Mühendislik ve Doğa Bilimleri Fakültesi -- Elektrik-Elektronik Mühendisliği Bölümü, Rifaioğlu, Ahmet Süreyya, OpenMETU
Jazyk: angličtina
Rok vydání: 2019
Předmět:
0301 basic medicine
Source code
Computer science
lcsh:Medicine
Overfitting
computer.software_genre
0302 clinical medicine
Data_FILES
ComputingMilieux_COMPUTERSANDEDUCATION
Data Mining
Protein function prediction
lcsh:Science
GeneralLiterature_REFERENCE(e.g.
dictionaries
encyclopedias
glossaries)

media_common
Multidisciplinary
Process (computing)
alignment
Multidisciplinary Sciences
Sequence annotation
annotation
Pseudomonas aeruginosa
media_common.quotation_subject
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
Protein function predictions
Machine learning
Models
Biological

Article
03 medical and health sciences
Deep Learning
Bacterial Proteins
Humans
Pseudomonas Infections
business.industry
Deep learning
lcsh:R
Feed forward
Proteins
sequence
Gene Ontology
030104 developmental biology
Biofilms
lcsh:Q
Neural Networks
Computer

Proteins | Genes | Protein functions
Artificial intelligence
business
computer
Software
030217 neurology & neurosurgery
Zdroj: Scientific Reports, Vol 9, Iss 1, Pp 1-16 (2019)
Scientific Reports
ISSN: 2045-2322
Popis: WOS: 000467839800015
31089211
Automated protein function prediction is critical for the annotation of uncharacterized protein sequences, where accurate prediction methods are still required. Recently, deep learning based methods have outperformed conventional algorithms in computer vision and natural language processing due to the prevention of overfitting and efficient training. Here, we propose DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, as a solution to Gene Ontology (GO) based protein function prediction. DEEPred was optimized through rigorous hyper-parameter tests, and benchmarked using three types of protein descriptors, training datasets with varying sizes and GO terms form different levels. Furthermore, in order to explore how training with larger but potentially noisy data would change the performance, electronically made GO annotations were also included in the training process. The overall predictive performance of DEEPred was assessed using CAFA2 and CAFA3 challenge datasets, in comparison with the state-of-the-art protein function prediction methods. Finally, we evaluated selected novel annotations produced by DEEPred with a literature-based case study considering the 'biofilm formation process' in Pseudomonas aeruginosa. This study reports that deep learning algorithms have significant potential in protein function prediction; particularly when the source data is large. The neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations. The source code and all datasets used in this study are available at: https://github.com/cansyl/DEEPred.
YOK OYP scholarship
The authors would like to thank Andrew Nightingale for the critical reading of the manuscript. ASR was supported by YOK OYP scholarship.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje