Abstract
ABSTRACTAn important role of a particular synonymous codon composition of a gene on its expression level is well-known. There are a number of algorithms optimising codon usage of recombinant genes to maximise their expression in host cells. Nevertheless, the problem has not been solved yet and remains relevant. In the realm of modern biotechnology, directing protein production to a specific level is crucial for metabolic engineering, genome rewriting, and a growing number of other applications. In this study, we propose two new simple statistical and empirical methods for predicting the protein expression level from the nucleotide sequence of the corresponding gene: Codon Expression Index Score (CEIS) and Codon Productivity Score (CPS). Both of these methods are based on the influence of each individual codon in the gene on the overall expression level of the encoded protein and the frequencies of isoacceptors in the species. Our predictions achieve a correlation with experimentally measured quantitative proteome data ofEscherichia coliup to a level of r=0.7, which is superior to any previously proposed methods. Our work helps to understand how codons determine translation rates; based on our methods, it is possible to design proteins optimised for expression in a particular organism.
Publisher
Cold Spring Harbor Laboratory