Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.04.19.537460v1?rss=1
Authors: Li, L., Li, C., Li, N., Zou, D., Zhao, W., Xue, Y., Zhang, Z., Bao, Y., Song, S.
Abstract: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has evolved many high-risk variants, resulting in repeated COVID-19 waves of pandemic during the past years. Therefore, accurate early-warning of high-risk variants is vital for epidemic prevention and control. Here we construct a machine learning model to predict high-risk variants of SARS-CoV-2 by LightGBM algorithm based on several important haplotype network features. As demonstrated on a series of different retrospective testing datasets, our model achieves accurate prediction of all variants of concern (VOC) and most variants of interest (AUC=0.96). Prediction based on the latest sequences shows that the newly emerging lineage BA.5 has the highest risk score and spreads rapidly to become a major epidemic lineage in multiple countries, suggesting that BA.5 bears great potential to be a VOC. In sum, our machine learning model is capable to early predict high-risk variants soon after their emergence, thus greatly improving public health preparedness against the evolving virus.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC