Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning

Autor: Jue Wang, Yufan Liu, Boxue Tian
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Journal of Cheminformatics, Vol 16, Iss 1, Pp 1-21 (2024)
Druh dokumentu: article
ISSN: 1758-2946
DOI: 10.1186/s13321-024-00920-2
Popis: Abstract Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without a published crystal structure. We trained and tested CLAPE-SMB on the SJC dataset, a non-redundant dataset based on sc-PDB, JOINED, and COACH420, and achieved an MCC of 0.529. We also compiled the UniProtSMB dataset, which merges sites from similar proteins based on raw data from UniProtKB database, and achieved an MCC of 0.699 on the test set. In addition, CLAPE-SMB achieved an MCC of 0.815 on our intrinsically disordered protein (IDP) dataset that contains 336 non-redundant sequences. Case studies of DAPK1, RebH, and Nep1 support the potential of this binding site prediction tool to aid in drug design. The code and datasets are freely available at https://github.com/JueWangTHU/CLAPE-SMB . Scientific contribution CLAPE-SMB combines a pre-trained protein language model with contrastive learning to accurately predict protein-small molecule binding sites, especially for proteins without experimental structures, such as IDPs. Trained across various datasets, this model shows strong adaptability, making it a valuable tool for advancing drug design and understanding protein-small molecule interactions.
Databáze: Directory of Open Access Journals
Nepřihlášeným uživatelům se plný text nezobrazuje