Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme

Autor: Tsai Feng Wang, Kuan Hsi Chen, Yuh-Jyh Hu
Rok vydání: 2019
Předmět:
Scheme (programming language)
Computer science
Generalization
lcsh:Computer applications to medicine. Medical informatics
Network topology
Machine learning
computer.software_genre
Biochemistry
03 medical and health sciences
Protein-protein interaction
0302 clinical medicine
Structural Biology
Interaction network
Protein Interaction Mapping
Feature (machine learning)
Animals
Humans
Stacked generalization
Databases
Protein

Representation (mathematics)
lcsh:QH301-705.5
Molecular Biology
030304 developmental biology
computer.programming_language
0303 health sciences
business.industry
Methodology Article
Applied Mathematics
Molecular Sequence Annotation
Ensemble learning
Computer Science Applications
ComputingMethodologies_PATTERNRECOGNITION
lcsh:Biology (General)
Area Under Curve
030220 oncology & carcinogenesis
lcsh:R858-859.7
Gene ontology
Protein–protein interaction prediction
Artificial intelligence
business
computer
Algorithms
Zdroj: BMC Bioinformatics, Vol 20, Iss 1, Pp 1-17 (2019)
BMC Bioinformatics
ISSN: 1471-2105
DOI: 10.1186/s12859-019-2907-1
Popis: Background Although various machine learning-based predictors have been developed for estimating protein–protein interactions, their performances vary with dataset and species, and are affected by two primary aspects: choice of learning algorithm, and the representation of protein pairs. To improve the performance of predicting protein–protein interactions, we exploit the synergy of multiple learning algorithms, and utilize the expressiveness of different protein-pair features. Results We developed a stacked generalization scheme that integrates five learning algorithms. We also designed three types of protein-pair features based on the physicochemical properties of amino acids, gene ontology annotations, and interaction network topologies. When tested on 19 published datasets collected from eight species, the proposed approach achieved a significantly higher or comparable overall performance, compared with seven competitive predictors. Conclusion We introduced an ensemble learning approach for PPI prediction that integrated multiple learning algorithms and different protein-pair representations. The extensive comparisons with other state-of-the-art prediction tools demonstrated the feasibility and superiority of the proposed method.
Databáze: OpenAIRE