Identifying promising sequences for protein engineering using a deep transformer protein language model-Reference-Cited by-同舟云学术

Identifying promising sequences for protein engineering using a deep transformer protein language model

Published:2023-06-20 Issue:11 Volume:91 Page:1471-1486
ISSN:0887-3585
Container-title:Proteins: Structure, Function, and Bioinformatics
language:en
Short-container-title:Proteins

Author:

Frisby Trevor S.¹^ORCID,Langmead Christopher James¹

Affiliation:

1. Computational Biology Department Carnegie Mellon University Pittsburgh Pennsylvania USA

Abstract

AbstractProtein engineers aim to discover and design novel sequences with targeted, desirable properties. Given the near limitless size of the protein sequence landscape, it is no surprise that these desirable sequences are often a relative rarity. This makes identifying such sequences a costly and time‐consuming endeavor. In this work, we show how to use a deep transformer protein language model to identify sequences that have the most promise. Specifically, we use the model's self‐attention map to calculate a Promise Score that weights the relative importance of a given sequence according to predicted interactions with a specified binding partner. This Promise Score can then be used to identify strong binders worthy of further study and experimentation. We use the Promise Score within two protein engineering contexts—Nanobody (Nb) discovery and protein optimization. With Nb discovery, we show how the Promise Score provides an effective way to select lead sequences from Nb repertoires. With protein optimization, we show how to use the Promise Score to select site‐specific mutagenesis experiments that identify a high percentage of improved sequences. In both cases, we also show how the self‐attention map used to calculate the Promise Score can indicate which regions of a protein are involved in intermolecular interactions that drive the targeted property. Finally, we describe how to fine‐tune the transformer protein language model to learn a predictive model for the targeted property, and discuss the capabilities and limitations of fine‐tuning with and without knowledge transfer within the context of protein engineering.

Publisher

Wiley

Subject

Molecular Biology,Biochemistry,Structural Biology

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.26536

Reference68 articles.

1. Introduction to current and future protein therapeutics: A protein engineering perspective

2. Antibody Structure and Function: The Basis for Engineering Therapeutics

3. Development of therapeutic antibodies for the treatment of diseases

4. Developing therapeutic monoclonal antibodies at pandemic pace

5. Converting enzymes into tools of industrial importance;Prasad S;Recent Pat Biotechnol,2018

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Proteins Need Extra Attention: Improving the Predictive Power of Protein Language Models on Mutational Datasets with Hint Tokens;2023-12-07

2. The Engineering, Expression, and Immobilization of Epimerases for D-allulose Production;International Journal of Molecular Sciences;2023-08-11