Author:
Vilar Santiago,Isom Daniel G.
Abstract
AbstractSARS-CoV-2 coronavirus has caused a world-wide crisis with profound effects on both healthcare and the economy. In order to combat the COVID-19 pandemic, research groups have shared viral genome sequence data through the GISAID initiative. We collected and computationally profiled ∼223,000 full SARS-CoV-2 proteome sequences from GISAID over one year for emergent nonsynonymous mutations. Our analysis shows that SARS-CoV-2 proteins are mutating at substantially different rates, with most viral proteins exhibiting little mutational variability. As anticipated, our calculations capture previously reported mutations occurred in the first period of the pandemic, such as D614G (Spike), P323L (NSP12), and R203K/G204R (Nucleocapsid), but also identify recent mutations like A222V and L18F (Spike) and A220V (Nucleocapsid). Our comprehensive temporal and geographical analyses show two periods with different mutations in the SARS-CoV-2 proteome: December 2019 to June 2020 and July to November 2020. Some mutation rates differ also by geography; the main mutations in the second period occurred in Europe. Furthermore, our structure-based molecular analysis provides an exhaustive assessment of mutations in the context of 3D protein structure. Emerging sequence-to-structure data is beginning to reveal the site-specific mutational tolerance of SARS-CoV2 proteins as the virus continues to spread around the globe.
Publisher
Cold Spring Harbor Laboratory
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献