Popis: |
This study analyzes and adds to the Low-N protein engineering with data-efficient deep learning work done by Biswas et al. We provide a complete, open-source, end-to-end re-implementation of the in silico protein engineering pipeline with improved computational efficiency, more detailed documentation, cleaner API and additional features to lower the barrier to entry for use of this pipeline as an engineering tool. We additionally perform a more thorough evaluation of the success and necessity of each step in the pipeline for in silico directed evolution, by re-implementing select portions of the study of TEM-1 β-lactamase, as well as applying the full in silico pipeline to two novel protein engineering tasks - increasing the melting temperature of plastic degrading enzyme IsPETase and improving the thermostability the MS2 bacteriophage's capsid protein. By comparing the performance of various UniRep-based feature representations we provide proof that linear kernels can be equivalent to additive fitness landscapes and outperform more complex models on small or simple mutation prediction tasks. This is assumed in many previous works but never explicitly shown. We believe it helps to elucidate the main strength of the eUniRep representation: its ability to overcome epistatic effects in proposing extensively mutated candidate sequences with optimized functionality. |