Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction

Autor:	Lafita, Aleix, Gonzalez, Ferran, Hossam, Mahmoud, Smyth, Paul, Deasy, Jacob, Allyn-Feuer, Ari, Seaton, Daniel, Young, Stephen
Rok vydání:	2024
Předmět:	Quantitative Biology - Genomics Computer Science - Machine Learning
Druh dokumentu:	Working Paper
Popis:	Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants, but they still lag experimental accuracy. Here, we present a novel fine-tuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS) assays using a Normalised Log-odds Ratio (NLR) head. We find consistent improvements in a held-out protein test set, and on independent DMS and clinical variant annotation benchmarks from ProteinGym and ClinVar. These findings demonstrate that DMS is a promising source of sequence diversity and supervised training data for improving the performance of PLMs for variant effect prediction. Comment: Machine Learning for Genomics Explorations workshop at ICLR 2024
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2405.06729 Zobrazit plný text záznamu View this record from Arxiv