Abstract
AbstractThe COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download atwww.gitlab.com/treangenlab/seqscreen.
Funder
intelligence advanced research projects activity
u.s. national library of medicine
Division of Computer and Network Systems
Directorate for Biological Sciences
Division of Intramural Research, National Institute of Allergy and Infectious Diseases
Publisher
Springer Science and Business Media LLC
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献