Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network

Author:

Khalili Elham1ORCID,Ramazi Shahin2ORCID,Ghanati Faezeh1,Kouchaki Samaneh3

Affiliation:

1. Department of Plant Science, Faculty of Science, Tarbiat Modarres University, Tehran, Iran

2. Department of Biophysics, Faculty of Biological Science, Tarbiat Modares University, Tehran, Iran

3. Department of Electrical and Electronic Engineering, .Faculty of Engineering and Physical Sciences, Centre for Vision, Speech, and Signal Processing, University of Surrey, Guildford, UK

Abstract

Abstract Phosphorylation of proteins is one of the most significant post-translational modifications (PTMs) and plays a crucial role in plant functionality due to its impact on signaling, gene expression, enzyme kinetics, protein stability and interactions. Accurate prediction of plant phosphorylation sites (p-sites) is vital as abnormal regulation of phosphorylation usually leads to plant diseases. However, current experimental methods for PTM prediction suffers from high-computational cost and are error-prone. The present study develops machine learning-based prediction techniques, including a high-performance interpretable deep tabular learning network (TabNet) to improve the prediction of protein p-sites in soybean. Moreover, we use a hybrid feature set of sequential-based features, physicochemical properties and position-specific scoring matrices to predict serine (Ser/S), threonine (Thr/T) and tyrosine (Tyr/Y) p-sites in soybean for the first time. The experimentally verified p-sites data of soybean proteins are collected from the eukaryotic phosphorylation sites database and database post-translational modification. We then remove the redundant set of positive and negative samples by dropping protein sequences with >40% similarity. It is found that the developed techniques perform >70% in terms of accuracy. The results demonstrate that the TabNet model is the best performing classifier using hybrid features and with window size of 13, resulted in 78.96 and 77.24% sensitivity and specificity, respectively. The results indicate that the TabNet method has advantages in terms of high-performance and interpretability. The proposed technique can automatically analyze the data without any measurement errors and any human intervention. Furthermore, it can be used to predict putative protein p-sites in plants effectively. The collected dataset and source code are publicly deposited at https://github.com/Elham-khalili/Soybean-P-sites-Prediction.

Funder

National Elite Foundation of Iran

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference104 articles.

1. Microbiota-mediated disease resistance in plants;Vannier;PLoS Pathog,2019

2. The plant immune system;Jones;Nature,2006

3. Protein phosphorylation in plant immunity: insights into the regulation of pattern recognition receptor-mediated signaling;Park;Front Plant Sci,2012

4. Posttranslational protein modifications in plant metabolism;Friso;Plant Physiol,2015

5. Signal processing by protein tyrosine phosphorylation in plants;Ghelis;Plant signal,2011

全球学者库

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"全球学者库"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前全球学者库共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2023 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3