Predicting the Trend of SARS-CoV-2 Mutation Frequencies Using Historical Data

Author:

Zhou Xinyu,Hu Kevin,Pan Minmin,Li Yajie,Zhang ChiORCID,Cao Sha

Abstract

AbstractAs the SARS-CoV-2 virus rapidly evolves, predicting the trajectory of viral variations has become a critical yet complex task. A deep understanding of future mutation patterns, in particular the mutations that will prevail in the near future, is vital in steering diagnostics, therapeutics, and vaccine strategies in the coming months.In this study, we developed a model to forecast future SARS-CoV-2 mutation surges in real-time, using historical mutation frequency data from the USA. To improve upon the accuracy of traditional time-series models, we transformed the prediction problem into a supervised learning framework using a sliding window approach. This involved breaking the time series of mutation frequencies into very short segments. Considering the time-dependent nature of the data, we focused on modeling the first-order derivative of the mutation frequency. We predicted the final derivative in each segment based on the preceding derivatives, employing various machine learning methods, including random forest, XGBoost, support vector machine, and neural network models, in this supervised learning setting. Empowered by the novel transformation strategy and the high capacity of machine learning models, we witnessed low prediction error that is confined within 0.1% and 1% when making predictions for future 30 and 80 days respectively. In addition, the method also led to a notable increase in prediction accuracy compared to traditional time-series models, as evidenced by lower MAE, and MSE for predictions made within different time horizons. To further assess the method’s effectiveness and robustness in predicting mutation patterns for unforeseen mutations, we categorized all mutations into three major patterns. The model demonstrated its robustness by accurately predicting unseen mutation patterns when training on data from two pattern categories while testing on the third pattern category, showcasing its potential in forecasting a variety of mutation trajectories.To enhance accessibility and utility, we built our methodology into an R-shiny app (https://swdatpredicts.shinyapps.io/rshiny_predict/), a tool with potential applicability in studying other infectious diseases, thus extending its relevance beyond the current pandemic.

Publisher

Cold Spring Harbor Laboratory

Reference30 articles.

1. W. H. Organization . WHO Coronavirus (COVID-19) Dashboard. 2023. url: https://covid19.who.int.

2. “The biological and clinical significance of emerging SARS-CoV-2 variants;In: Nature Reviews Genetics,2021

3. “The evolution and biology of SARS-CoV-2 variants;In: Cold Spring Harbor perspectives in medicine,2022

4. “Vaccine breakthrough infections with SARS-CoV-2 variants;In: New England Journal of Medicine,2021

5. “SARS-CoV-2 variant biology: immune escape, transmission and fitness;C.-1;In: Nature Reviews Microbiology,2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3