Autor: |
Xiangling Liu, Xinyu Yang, Linkun Ouyang, Guibing Guo, Jin Su, Ruibin Xi, Ke Yuan, Fajie Yuan |
Rok vydání: |
2022 |
DOI: |
10.1101/2022.09.30.510294 |
Popis: |
Accurately predicting the effects of mutations in cancer has the potential to improve existing treatments and identify novel therapeutic targets. In this paper, we evidence for the first time that the large-scale pre-trained protein language models (PPLMs) are zero-shot predictors for twoclinicallyrelevant tasks: identifying diseasecausing mutations and predicting patient survival rate. Then we benchmark a series of state-of-the-art (SOTA) PPLMs on 2279 protein variants across 20 cancer-related genes. Our empirical results show that the PPLMs outperform the SOTA baseline, EVE [1], trained on multiple sequence alignment (MSA) data. We also demonstrate that the evolutionary index score, generated from the PPLM’s softmax layer, is good indicator for both mutation pathogenicity and patient survival rate. Our paper has taken a key step toward the clinical utility of large-scale PPLMs. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|