Leveraging a large language model to predict protein phase transition: a physical, multiscale and interpretable approach

Author:

Frank Mor,Ni Pengyu,Jensen Matthew,Gerstein Mark B

Abstract

AbstractProtein phase transitions (PPTs) from the soluble state to a dense liquid phase (forming droplets via liquid-liquid phase separation) or solid aggregates (such as amyloid) play key roles in pathological processes associated with age-related diseases such as Alzheimer’s disease (AD). Several computational frameworks are capable of separately predicting the formation of protein droplets or amyloid aggregates based on protein sequences, yet none have tackled the prediction of both within a unified framework. Recently, large language models (LLMs) have exhibited great success in protein structure prediction; however, they have not yet been used for PPTs. Here, we fine-tune a LLM for predicting PPTs and demonstrate its superior performance compared to suitable classical benchmarks. Due to the “black-box” nature of the LLM, we also employ a classical random forest model along with biophysical features to facilitate interpretation. Finally, focusing on AD-related proteins, we demonstrate that greater aggregation is associated with reduced gene expression in AD, suggesting a natural defense mechanism.Significance StatementThe protein phase transition is a physical mechanism associated with both physiological processes and age-related diseases. Here, we present a modeling approach for predicting a specific protein sequence’s propensity to undergo phase transitions directly from its sequence. Our methodology involves utilizing a large language model to analyze the likelihood of a given protein sequence existing in a particular material state. Additionally, for enhanced interpretability, we incorporate a classical knowledge-based model. Our results suggest the potential for accurately predicting the propensity to form either liquid or solid condensates. Furthermore, our findings indicate the potential regulation of this propensity by gene expression under pathological conditions to prevent aggregation.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3