Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.31.526458v1?rss=1
Authors: de Oliveira Martins, L., Mather, A. E., Page, A. J.
Abstract: Despite millions of SARS-CoV-2 genomes being sequenced and shared globally, manipulating such data sets is still challenging, especially selecting sequences for focused phylogenetic analysis. We present a novel method, uvaia, which is based on partial and exact sequence similarity for quickly extracting database sequences similar to query sequences of interest. Many SARS-CoV-2 phylogenetic analyses rely on very low numbers of ambiguous sites as a measure of quality since ambiguous sites do not contribute to single nucleotide polymorphism (SNP) differences, which uvaia alleviates by using measures of sequence similarity that consider partially ambiguous sites. Such fine-grained definition of similarity allows not only for better phylogenetic analyses, but also for improved classification and biogeographical inferences. Uvaia works natively with compressed files, can use multiple cores and efficiently utilises memory, being able to analyse large data sets on a standard desktop.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC