Learning the Language of Genes: Representing Global Codon Bias with Deep Language Models

Autor: Fujimoto, M. Stanley, Bodily, Paul M., Lyman, Cole A., Jacobsen, J. Andrew, Clement, Mark J.
Rok vydání: 2017
Předmět:
Zdroj: Utah Space Grant Consortium
Popis: Codon bias, the usage patterns of synonymous codons for encoding a protein sequence as nucleotides, is a biological phenomenon that is not well understood. Current methods that measure and model the codon bias of an organism exist for usage in codon optimization. In synthetic biology, codon optimization is a task the involves selecting the appropriate codons to reverse translate a protein sequence into a nucleotide sequence to maximize expression in a vector. These features include codon adaptation index (CAI) [1], individual codon usage (ICU), hidden stop codons (HSC) [2] and codon context (CC) [3]. While explicitly modeling these features has helped us to engineer high synthesis yield proteins, it is unclear what other biological features should be taken into account during codon selection for protein synthesis maximization. In this article, we present a method for modeling global codon bias through deep language models that is more robust than current methods by providing more contextual information and long-range dependencies to be considered during codon selection.
Databáze: OpenAIRE