Abstract
The SARS-CoV-2 pandemic has brought molecular biology and genomic sequencing into the public consciousness and lexicon. With an emphasis on rapid turnaround, genomic data informed both diagnostic and surveillance decisions for the current pandemic at a previously unheard-of scale. The surge in the submission of genomic data to publicly available databases proved essential as comparing different genome sequences offers a wealth of knowledge, including phylogenetic links, modes of transmission, rates of evolution, and the impact of mutations on infection and disease severity. However, the scale of the pandemic has meant that sequencing runs are rarely repeated due to limited sample material and/or the availability of sequencing resources, resulting in the upload of some imperfect runs to public repositories. As a result, it is crucial to investigate the data obtained from these imperfect runs to determine whether the results are reliable prior to depositing them in a public database. Numerous studies have identified a variety of sources of contamination in public next-generation sequencing (NGS) data as the number of NGS studies increases along with the diversity of sequencing technologies and procedures. For this study, we conducted an in silico experiment with known SARS-CoV-2 sequences produced from Oxford Nanopore Technologies sequencing to investigate the effect of contamination on lineage calls and single nucleotide variants (SNVs). A contamination threshold below which runs are expected to generate accurate lineage calls and maintain genome-relatedness and integrity was identified. Together, these findings provide a benchmark below which imperfect runs may be considered robust for reporting results to both stakeholders and public repositories and reduce the need for repeat or wasted runs.
Funder
Public Health Agency of Canada
Genome Canada
Publisher
Public Library of Science (PLoS)
Reference29 articles.
1. High-precision and cost-efficient sequencing for real-time COVID-19 surveillance.;SY Park;Scientific Reports |.,2021
2. Sporte, A, Baker MG, Murdoch DR, Drummond AJ, Welch D, Simpson CR, French N, Homes EC, de Ligt J. Use of Genomics to Track Coronavirus Disease Outbreaks, New Zealand;JL Geoghegan;Emerg Infect Dis,2021
3. Differing impacts of global and regional responses on SARS-CoV-2 transmission cluster dynamics.;BR Magalis;bioRxiv. 2020
4. Genomic epidemiology of the first two waves of SARS-CoV-2 in Canada.;A McLaughlin;Elife,2022
5. Isolation of Virus from a SARS Patient and Genome-wide Analysis of Genetic Mutations Related to Pathogenesis and Epidemiology from 47 SARS-CoV Isolates.;Y Zhu;Virus Genes,2005