Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.07.22.550164v1?rss=1
Authors: van Bemmelen, J., Smyth, D. S., Baaijens, J. A.
Abstract: Metagenomic profiling algorithms commonly rely on genomic differences between lineages, strains or species to infer the relative abundances of sequences present in a sample. This observation plays an important role in the analysis of diverse microbial communities, where targeted sequencing of 16S and 18S rRNA, both well-known hypervariable genomic regions, have led to insights in microbial diversity and the discovery of novel organisms. However, the variable nature of discriminatory regions can also act as a double-edged sword, as the sought after variability can make it difficult to design primers for their amplification through PCR. Moreover, the most variable regions are not necessarily the most informative regions for the purpose of differentiation; one should focus on regions which maximize the number of lineages that can be distinguished. Here we present AmpliDiff, a computational tool that simultaneously finds such highly discriminatory genomic regions, as well as primers allowing for the amplification of these regions. We show that regions and primers found by AmpliDiff can be used to accurately estimate relative abundances of SARS-CoV-2 lineages, for example in wastewater sequencing data. We obtain mean absolute prediction errors that are comparable with using whole genome information to estimate relative abundances. Furthermore, our results show that AmpliDiff is robust against incomplete input data, and that primers designed by AmpliDiff continue to bind to genomes originating from months after the primers were selected. With AmpliDiff we provide an effective and efficient alternative to whole genome sequencing for estimating lineage abundances in viral metagenomes.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC