TCR2vec: a deep representation learning framework of T-cell receptor sequence and function


Jiang Yuepeng,Huo Miaozhe,Zhang Pingping,Zou Yiping,Li Shuai ChengORCID


AbstractThe T-cell receptor (TCR) repertoires are critical components of the adaptive immune system, and machine learning methods were proposed to analyze the TCR repertoire data. However, most methods work solely on the hypervariable CDR3 regions of TCRs, overlooking the information encoded in other domains. Representing full TCRs as informative vectors can be indispensable for developing reliable and effective machine learning models. We introduce TCR2vec, a deep representation learning framework with 12 layers of transformer blocks, to pave the way for downstream modelings of full TCRs. Together with masked language modeling (MLM), we propose a novel pretraining task named similarity preservation modeling (SPM) to capture the sequence similarities of TCRs. Through a multi-task pretraining procedure on MLM and SPM, TCR2vec learns a contextual understanding of TCRs within a similarity-preserved vector space. We first verify the effectiveness of TCR2vec in predicting TCR’s binding specificity and TCR clustering through comparison with three other embedding approaches. TCR2vec can be finetuned on small task-specific labeled data for enhanced performance, which outperforms state-of-the-art models by 2-25% in predicting TCR’s binding specificity. Next, we compare the performance of two versions of TCR2vec pretrained on full TCRs (TCR2vec) or CDR3s (CDR3vec) and demonstrate that TCR2vec consistently outperforms CDR3vec by 12-14%. Further analysis of attention maps reveals that residues outside CDR3 also make notable contributions to the recognition of antigens. TCR2vec is available at


Cold Spring Harbor Laboratory

Reference47 articles.

1. Estimating t-cell repertoire diversity: Limitations of classical estimators and a new approach;Philosophical Transactions of the Royal Society B: Biological Sciences,2015

2. TEINet: a deep learning framework for prediction of TCR–epitope binding specificity

3. Predicting recognition between t cell receptors and epitopes with tcrgp;PLoS computational biology,2021

4. Deeptcr is a deep learning framework for revealing sequence concepts within t-cell repertoires;Nature communications,2021

5. Deep autoregressive generative models capture the intrinsics embedded in T-cell receptor repertoires

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3