VPatho: a deep learning-based two-stage approach for accurate prediction of gain-of-function and loss-of-function variants

Author:

Ge Fang12,Li Chen3ORCID,Iqbal Shahid34,Muhammad Arif5,Li Fuyi67,Thafar Maha A8,Yan Zihao1,Worachartcheewan Apilak5,Xu Xiaofeng9,Song Jiangning34ORCID,Yu Dong-Jun1ORCID

Affiliation:

1. Nanjing University of Science and Technology School of Computer Science and Engineering, , 200 Xiaolingwei, Nanjing 210094, China

2. Bengbu University School of Computer Science and Information Engineering, , 1866 Caoshan Road, Bengbu, 233030, China

3. Monash University Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, , Melbourne, VIC 3800, Australia

4. Monash University Monash Data Futures Institute, , Melbourne, VIC 3800, Australia

5. Mahidol University Department of Community Medical Technology, Faculty of Medical Technology, , Bangkok 10700, Thailand

6. The University of Melbourne Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, , Melbourne, Victoria, Australia

7. Northwest A&F University College of Information Engineering, , Yangling 712100, China

8. Taif University Department of Computer Science, College of Computers and Information Technology, , P.O.Box 110099, Taif 21944, Saudi Arabia

9. Anhui Polytechnic University School of Computer and Information, , Beijingzhong Road, Wuhu 241000, China

Abstract

Abstract Determining the pathogenicity and functional impact (i.e. gain-of-function; GOF or loss-of-function; LOF) of a variant is vital for unraveling the genetic level mechanisms of human diseases. To provide a ‘one-stop’ framework for the accurate identification of pathogenicity and functional impact of variants, we developed a two-stage deep-learning-based computational solution, termed VPatho, which was trained using a total of 9619 pathogenic GOF/LOF and 138 026 neutral variants curated from various databases. A total number of 138 variant-level, 262 protein-level and 103 genome-level features were extracted for constructing the models of VPatho. The development of VPatho consists of two stages: (i) a random under-sampling multi-scale residual neural network (ResNet) with a newly defined weighted-loss function (RUS-Wg-MSResNet) was proposed to predict variants’ pathogenicity on the gnomAD_NV + GOF/LOF dataset; and (ii) an XGBOD model was constructed to predict the functional impact of the given variants. Benchmarking experiments demonstrated that RUS-Wg-MSResNet achieved the highest prediction performance with the weights calculated based on the ratios of neutral versus pathogenic variants. Independent tests showed that both RUS-Wg-MSResNet and XGBOD achieved outstanding performance. Moreover, assessed using variants from the CAGI6 competition, RUS-Wg-MSResNet achieved superior performance compared to state-of-the-art predictors. The fine-trained XGBOD models were further used to blind test the whole LOF data downloaded from gnomAD and accordingly, we identified 31 nonLOF variants that were previously labeled as LOF/uncertain variants. As an implementation of the developed approach, a webserver of VPatho is made publicly available at http://csbio.njust.edu.cn/bioinf/vpatho/ to facilitate community-wide efforts for profiling and prioritizing the query variants with respect to their pathogenicity and functional impact.

Funder

Provincial Natural Science Foundation of Anhui

Natural Science Foundation of Anhui Province of China

Monash University, Taif University Researchers

National Institute of Allergy and Infectious Diseases of the National Institutes of Health

Australian Research Council

National Health and Medical Research Council of Australia

Foundation of National Defense Key Laboratory of Science and Technology

Natural Science Foundation of Jiangsu

National Natural Science Foundation of China

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3