Protein–protein interaction site prediction by model ensembling with hybrid feature and self-attention

Autor:	Hanhan Cong, Hong Liu, Yi Cao, Cheng Liang, Yuehui Chen
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Protein–protein interaction Hybrid feature Self-attention Integration framework Computer applications to medicine. Medical informatics R858-859.7 Biology (General) QH301-705.5
Zdroj:	BMC Bioinformatics, Vol 24, Iss 1, Pp 1-21 (2023)
Druh dokumentu:	article
ISSN:	1471-2105
DOI:	10.1186/s12859-023-05592-7
Popis:	Abstract Background Protein–protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding window technique, which simply merges all the features of residues into a vector. The importance of some key residues may be weakened in the feature vector, leading to poor performance. Results We propose a novel sequence-based method for PPI sites prediction. The new network model, PPINet, contains multiple feature processing paths. For a residue, the PPINet extracts the features of the targeted residue and its context separately. These two types of features are processed by two paths in the network and combined to form a protein representation, where the two types of features are of relatively equal importance. The model ensembling technique is applied to make use of more features. The base models are trained with different features and then ensembled via stacking. In addition, a data balancing strategy is presented, by which our model can get significant improvement on highly unbalanced data. Conclusion The proposed method is evaluated on a fused dataset constructed from Dset186, Dset_72, and PDBset_164, as well as the public Dset_448 dataset. Compared with current state-of-the-art methods, the performance of our method is better than the others. In the most important metrics, such as AUPRC and recall, it surpasses the second-best programmer on the latter dataset by 6.9% and 4.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model, especially, the hybrid feature. We share our code for reproducibility and future research at https://github.com/CandiceCong/StackingPPINet .
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/a4e4a11da08441f9860c26d5f75b848a Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.