A systematic analysis of regression models for protein engineering.

Autor: Michael R; Department of Computer Science, University of Copenhagen, Copenhagen, Denmark., Kæstel-Hansen J; Department of Chemistry, University of Copenhagen, Copenhagen, Denmark., Mørch Groth P; Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.; Enzyme Research, Novozymes A/S, Kongens Lyngby, Denmark., Bartels S; Department of Computer Science, University of Copenhagen, Copenhagen, Denmark., Salomon J; Enzyme Research, Novozymes A/S, Kongens Lyngby, Denmark., Tian P; Enzyme Research, Novozymes A/S, Kongens Lyngby, Denmark., Hatzakis NS; Department of Chemistry, University of Copenhagen, Copenhagen, Denmark., Boomsma W; Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
Jazyk: angličtina
Zdroj: PLoS computational biology [PLoS Comput Biol] 2024 May 03; Vol. 20 (5), pp. e1012061. Date of Electronic Publication: 2024 May 03 (Print Publication: 2024).
DOI: 10.1371/journal.pcbi.1012061
Abstrakt: To optimize proteins for particular traits holds great promise for industrial and pharmaceutical purposes. Machine Learning is increasingly applied in this field to predict properties of proteins, thereby guiding the experimental optimization process. A natural question is: How much progress are we making with such predictions, and how important is the choice of regressor and representation? In this paper, we demonstrate that different assessment criteria for regressor performance can lead to dramatically different conclusions, depending on the choice of metric, and how one defines generalization. We highlight the fundamental issues of sample bias in typical regression scenarios and how this can lead to misleading conclusions about regressor performance. Finally, we make the case for the importance of calibrated uncertainty in this domain.
Competing Interests: The authors have declared that no competing interests exist.
(Copyright: © 2024 Michael et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje