Affiliation:
1. Centre of Biotechnology & Microbiology, University of Peshawar, Peshawar, Pakistan
2. College of Software Convergence, Sejong University, Seoul, South Korea
Abstract
Background:
Peptidases are a group of enzymes which catalyze the cleavage of peptide
bonds. Around 2-3% of the whole genome codes for proteases and about one-third of all known
proteases are serine proteases which are divided into 13 clans and 40 families. They are involved
in diverse physiological roles such as digestion, coagulation of blood, fibrinolysis, processing of
proteins and prohormones, signaling pathways, complement fixation, and have a vital role in the
immune defense system. Based on their functions, they can broadly be divided into two classes;
GASPIDs (Granule Associated Serine Peptidases involved in Immune Defense System) and Non-
GASPIDs. GASPIDs, in particular are involved in immune-associated functions i.e. initiating
apoptosis to kill virally infected and cancerous cells, cytokine modulation for the generation of
inflammatory responses, and direct killing of pathogens through phagosomes.
Methods:
In this study, sequence-based characterization of these two types of serine proteases is
performed. We first identified sequences by analyzing multiple online databases as well as by
analyzing whole genomes of different species from different orthologous and non-orthologous
species. Sequences were identified by devising a distinct criterion to differentiate GASPIDs from
Non-GASPIDs. The translated version of these sequences was then subjected to feature extraction.
Using these distinctive features, we differentiated GASPIDs from Non-GASPIDs by applying
multiple supervised machine learning models.
Results and Conclusion:
Our results show that, among the three classifiers used in this study,
SVM classifier coupled with tripeptide as feature method has shown the best accuracy in
classification of sequences as GASPIDs and Non-GASPIDs.
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry
Reference28 articles.
1. Wang D.; Huang G.; Bin. Protein sequence classification using extreme learning machine. Proc Int Jt Conf Neural Netw 2005,3,1406-1411
2. Pandit M.; Rueda L.; Ngom A.; Prediction of biological protein-protein interaction types using short-linear motifs.
ACM Conf Bioinformatics, Comput Biol Biomed Informatics, ACM-BCB 2013,698-699
3. Wagenknecht S.; Lee M.K.; Lustig C.; O’Neill J.; Zade H.; Algorithms at work: Empirical diversity, analytic vocabularies, design implications. Proc ACM Conf Comput Support Coop Work CSCW ,pp. 536-43
4. Kulan H.; Dag T.; Using machine learning classifiers to identify the critical proteins in Down syndrome. Proceedings of the 2018 2nd International Conference on Computational Biology and Bioinformatics ACM Digital Lib 2018,51-54
5. Sheridan R.P.; Venkataraghavan R.; A systematic search for protein signature sequences. Proteins 1992,14(1),16-28
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献