DrivR-Base: A Feature Extraction Toolkit For Variant Effect Prediction Model Construction-Reference-Cited by-同舟云学术

DrivR-Base: A Feature Extraction Toolkit For Variant Effect Prediction Model Construction

Published:2024-01-17 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Francis Amy^ORCID,Campbell Colin,Gaunt Tom

Abstract

AbstractMotivationRecent advancements in sequencing technologies have led to the discovery of numerous variants in the human genome. However, understanding their precise roles in diseases remains challenging due to their complex functional mechanisms. Various methodologies have emerged to predict the pathogenic significance of these genetic variants. Typically, these methods employ an integrative approach, leveraging diverse data sources that provide critical insights into genomic function. Despite the abundance of publicly available data sources and databases, the process of navigating, extracting, and pre-processing features for machine learning models can be daunting. Furthermore, researchers often invest substantial effort in feature extraction, only to later discover that these features lack informativeness.ResultsIn this paper, we presentDrivR-Base, an innovative resource that efficiently extracts and integrates molecular information (features) for single nucleotide variants from a wide range of databases and tools, including AlphaFold, ENCODE, andVariant Effect Predictor. The resulting features can be used as input for machine learning models designed to predict the pathogenic impact of human genome variants in disease. Moreover, these feature sets have applications beyond this, including haploinsufficiency prediction and the development of drug repurposing tools. We describe the resource’s development, practical applications, and potential for future expansion and enhancement.Availability and ImplementationDrivR-Basesource code is available athttps://github.com/amyfrancis97/DrivR-Base.

Publisher

Cold Spring Harbor Laboratory

Reference34 articles.

1. Adzhubei, I. , Jordan, D. M. , and Sunyaev, S. R. (2013). Predicting functional effect of human missense mutations using polyphen-2. Current protocols in human genetics / editorial board, Jonathan L. Haines … [et al.], 0 7:Unit7.20.

2. The Protein Data Bank

3. Variant pathogenic prediction by locus variability: the importance of the current picture of evolution;European Journal of Human Genetics,2022

4. Campbell, C. and Ying, Y. (2011). Learning with Support Vector Machines. Morgan Clay-pool Publishers.

5. Cheng, J. , Novati, G. , Pan, J. , Bycroft, C. , Žemgulytė, A. , Applebaum, T. , Pritzel, A. , Wong, L. H. , Zielinski, M. , Sargeant, T. , Schneider, R. G. , Senior, A. W. , Jumper, J. , Hassabis, D. , Kohli, P. , and Žiga Avsec (2023). Accurate proteome-wide missense variant effect prediction with alphamissense. Science, 381.