Affiliation:
1. Indraprastha Institute of Information Technology, Delhi Okhla Industrial Estate
Abstract
Abstract
Background
HLA-DRB1*04:01 is associated with many diseases including sclerosis, arthritis, diabetes, and Covid19. Thus, it is important to scan binders of HLA-DRB1*04:01 in an antigen to develop immunotherapy, vaccines, and protection against these diseases. One of the major limitations of existing methods for predicting with HLA-DRB1*04:01 binders is that these methods are trained on small datasets. This study presents a method HLA-DR4Pred2 developed on a large dataset containing 12676 binders and an equal number of non-binders. It is an improved version of HLA-DR4Pred, which was trained on a small dataset containing only 576 binders and an equal number of binders.
Results
All models in this study were trained, optimized, and tested on 80% of data called training datasets using five-fold cross-validation; final models were evaluated on 20% of data called validation/independent dataset. A wide range of machine learning techniques have been employed to develop prediction models and achieved maximum AUROC of 0.90 and 0.87 on validation dataset using composition and binary profile features respectively. The performance of our composition based model increased from 0.90 to 0.93 when combined with BLAST search. In addition, we also developed our models on alternate or realistic dataset that contain 12676 binders and 86300 non-binders and achieved a maximum AUROC of 0.99.
Conclusions
Our method performs better than existing methods when we compare the performance of our best model with the performance of existing methods on the validation dataset. Finally, we developed the standalone and online version of HLA-DR4Pred2 for predicting, designing, and virtual scanning of HLA-DRB1*04:01(https://webs.iiitd.edu.in/raghava/hladr4pred2/ ; https://github.com/raghavagps/hladr4pred2) .
Publisher
Research Square Platform LLC
Reference54 articles.
1. Human leukocyte antigen (HLA) and immune regulation: How do classical and non-classical HLA alleles modulate immune response to human immunodeficiency virus and hepatitis C virus infections?;Crux NB;Front Immunol,2017
2. The HLA genomic loci map: expression, interaction, diversity and disease;Shiina T;J Hum Genet,2009
3. The HLA system: genetics, immunology, clinical testing, and clinical implications;Choo SY;Yonsei Med J,2007
4. Classification of human leukocyte antigen (HLA) supertypes;Wang M;Methods Mol Biol,2014
5. IPD-IMGT/HLA database;Robinson J;Nucleic Acids Res,2020