Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows

Author:

Connor Ryan1ORCID,Shakya Migun2,Yarmosh David A.34ORCID,Maier Wolfgang5ORCID,Martin Ross6,Bradford Rebecca34ORCID,Brister J. Rodney1,Chain Patrick S. G.2ORCID,Copeland Courtney A.7,di Iulio Julia8ORCID,Hu Bin2ORCID,Ebert Philip9,Gunti Jonathan1,Jin Yumi1,Katz Kenneth S.1,Kochergin Andrey1,LaRosa Tré7,Li Jiani6,Li Po-E2ORCID,Lo Chien-Chi2ORCID,Rashid Sujatha3,Maiorova Evguenia S.6,Xiao Chunlin1,Zalunin Vadim1,Purcell Lisa8ORCID,Pruitt Kim D.1

Affiliation:

1. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

2. Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA

3. American Type Culture Collection, Manassas, VA 20110, USA

4. BEI Resources, Manassas, VA 20110, USA

5. Galaxy Europe Team, University of Freiburg, 79085 Freiburg, Germany

6. Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA

7. Deloitte Consulting LLP, Rosslyn, VA 22209, USA

8. Vir Biotechnology Inc., San Francisco, CA 94158, USA

9. Eli Lilly and Company, Indianapolis, IN 46225, USA

Abstract

Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants. Here, we show that for both Illumina and Oxford Nanopore sequencing platforms, downstream bioinformatic protocols used by industry, government, and academic groups resulted in different virus sequences from same sample. These bioinformatic workflows produced consensus genomes with differences in single nucleotide polymorphisms, inclusion and exclusion of insertions, and/or deletions, despite using the same raw sequence as input datasets. Here, we compared and characterized such discrepancies and propose a specific suite of parameters and protocols that should be adopted across the field. Consistent results from bioinformatic workflows are fundamental to SARS-CoV-2 and future pathogen surveillance efforts, including pandemic preparation, to allow for a data-driven and timely public health response.

Funder

National Center for Biotechnology Information of the National Library of Medicine (NLM), National Institutes of Health

European Union’s Horizon 2020 and Horizon Europe research and innovation programs

National Institute of Allergy and Infectious Diseases

Los Alamos National Laboratory’s Laboratory-Directed Research and Development program

Centers for Disease Control and Prevention

Publisher

MDPI AG

Reference64 articles.

1. From Public Health Genomics to Precision Public Health: A 20-Year Journey;Khoury;Genet. Med.,2018

2. Status and Potential of Bacterial Genomics for Public Health Practice: A Scoping Review;Descamps;Implement. Sci.,2019

3. GenBank;Sayers;Nucleic Acids Res.,2022

4. The Sequence Read Archive: A Decade More of Explosive Growth;Katz;Nucleic Acids Res.,2022

5. Genomics and Epidemiological Surveillance;Lo;Nat. Rev. Microbiol.,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3