NTpred: A robust and precise machine learning framework for in-silico identification of Tyrosine nitration sites in protein sequences

Author:

Datta Sourajyoti,Asim Muhammad Nabeel,Dengel Andreas,Ahmed Sheraz

Abstract

AbstractPost-translational modifications (PTMs) either enhance a protein’s activity in various sub-cellular processes, or degrade their activity which leads towards failure of intracellular processes. Tyrosine nitration (NT) modification degrades protein’s activity that initiate and propagate various diseases including Neurodegenerative, Cardiovascular, Autoimmune diseases, and Carcinogenesis. Identification of NT modification support development of novel therapies and drug discoveries for associated diseases. Identification of NT modification in biochemical labs is expensive, time consuming, and error-prone. To supplement this process, several computational approaches have been proposed. However these approaches remain fail to precisely identify NT modification, due to the extraction of irrelevant, redundant and less discriminative features from protein sequences. The paper in hand presents NTpred framework competent in extracting comprehensive features from raw protein sequences using four different sequence encoders. To reap the benefits of different encoders, it generates four additional feature spaces by fusing different combinations of individual encodings. Furthermore, it eradicates irrelevant and redundant features from eight different feature spaces through a Recursive Feature Elimination process. Selected features of four individual encodings and four feature fusion vectors are used to train eight different Gradient Boosted Tree classifiers. The probability scores from the trained classifiers are utilized to generate a new probabilistic feature space, that is utilized to train a Logistic Regression classifier. On BD1 benchmark dataset, the proposed framework outperform existing best performing predictor in 5-fold cross validation and independent test evaluation with combined improvement of 13.7% in MCC and 20.1% in AUC. Similarly, on BD2 benchmark dataset, the proposed framework outperform existing best performing predictor with combined improvement of 5.3% in MCC and 1.0% in AUC.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3