Development and Validation of a Novel Protein−Ligand Fingerprint To Mine Chemogenomic Space: Application to G Protein-Coupled Receptors and Their Ligands
Autor: | Didier Rognan, Nathanael Weill |
---|---|
Rok vydání: | 2009 |
Předmět: |
Databases
Factual General Chemical Engineering Library and Information Sciences Bit array Ligands Machine learning computer.software_genre Peptide Mapping Receptors G-Protein-Coupled Artificial Intelligence Humans Computer Simulation Binding site Mathematics G protein-coupled receptor Subject Headings Binding Sites Ligand Novel protein business.industry Hydrogen Bonding Pattern recognition General Chemistry Chemical space Computer Science Applications Support vector machine Statistical classification Models Chemical Artificial intelligence business computer |
Zdroj: | Journal of Chemical Information and Modeling. 49:1049-1062 |
ISSN: | 1549-960X 1549-9596 |
Popis: | The present study introduces a novel low-dimensionality fingerprint encoding both ligand and target properties which is suitable to mine protein-ligand chemogenomic space. Whereas ligand properties have been represented by standard descriptors, protein cavities are encoded by a fixed length bit string describing pharmacophoric properties of a definite number of binding site residues. In order to simplify the cavity fingerprint, the concept was applied here to a unique family of targets (G protein-coupled receptors) with a homogeneous cavity description. Particular attention was given to set up data sets of really diverse protein-ligand pairs covering as exhaustively as possible both ligand and target spaces. Several machine learning classification algorithms were trained on two sets of roughly 200000 receptor-ligand fingerprints with a different definition of inactive decoys. Cross-validated models show excellent precision (>0.9) in distinguishing true from false pairs with a particular preference for support vector machine classifiers. When applied to two external test sets of GPCR ligands, the most predictive models were not those performing the best in the previous cross-validation. The ability to recover true GPCR ligands (ligand prediction mode) or true GPCRs (receptor prediction mode) depends on multiple parameters: the molecular complexity of the ligands, the chemical space from which ligand decoys are selected to generate false protein-ligand pairs, and the target space under consideration. In most cases, predicting ligands is easier than predicting receptors. Although receptor profiling is possible, it probably requires a more detailed description of the ligand-binding site. Noteworthy, protein-ligand fingerprints outperform the corresponding ligand fingerprints in mining the GPCR-ligand space. Since they can be applied to a much larger number of receptors than ligand-based fingerprints, protein-ligand fingerprints represent a novel and promising way to directly screen protein-ligand pairs in chemogenomic applications. |
Databáze: | OpenAIRE |
Externí odkaz: |