A feature extraction free approach for protein interactome inference from co-elution data.

Autor: Chen YH; Bioinformatics Program, Taiwan International Graduate Program, National Taiwan University, Taipei 106, Taiwan.; Bioinformatics Program, Taiwan International Graduate Program, Academic Sinica, Taipei 11529, Taiwan.; Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan., Chao KH; Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan., Wong JY; Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan., Liu CF; Institute of Molecular Biology, Academia Sinica, Taipei, 11529, Taiwan., Leu JY; Institute of Molecular Biology, Academia Sinica, Taipei, 11529, Taiwan., Tsai HK; Bioinformatics Program, Taiwan International Graduate Program, National Taiwan University, Taipei 106, Taiwan.; Bioinformatics Program, Taiwan International Graduate Program, Academic Sinica, Taipei 11529, Taiwan.; Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan.
Jazyk: angličtina
Zdroj: Briefings in bioinformatics [Brief Bioinform] 2023 Jul 20; Vol. 24 (4).
DOI: 10.1093/bib/bbad229
Abstrakt: Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance. Several computational methods have been designed to analyze CF-MS data and construct probabilistic protein-protein interaction (PPI) networks. Current methods usually first infer PPIs based on handcrafted CF-MS features, and then use clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features and severely imbalanced data distribution. However, the handcrafted features based on domain knowledge might introduce bias, and current methods also tend to overfit due to the severely imbalanced PPI data. To address these issues, we present a balanced end-to-end learning architecture, Software for Prediction of Interactome with Feature-extraction Free Elution Data (SPIFFED), to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network. SPIFFED outperforms the state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED had greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from multiple CF-MS data. Using the clustering software (i.e. ClusterONE), SPIFFED allows users to infer high-confidence protein complexes depending on the CF-MS experimental designs. The source code of SPIFFED is freely available at: https://github.com/bio-it-station/SPIFFED.
(© The Author(s) 2023. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje