Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.07.20.549822v1?rss=1
Authors: Ulrich, J.-U., Renard, B. Y.
Abstract: Metagenomic long-read sequencing is gaining popularity for various applications, including pathogen detection and microbiome studies. To analyze the large data created in those studies, software tools need to taxonomically classify the sequenced molecules and estimate the relative abundances of organisms in the sequenced sample. Due to the exponential growth of reference genome databases, the current taxonomic classification methods have large computational requirements. This issue motivated us to develop a new data structure for fast and memory-efficient querying of long reads. Here we present Taxor as a new tool for long-read metagenomic classification using a hierarchical interleaved XOR filter data structure for indexing and querying large reference genome sets. Taxor implements several k-mer-based approaches, such as syncmers for pseudo-alignment to classify reads and an Expectation-Maximization algorithm for metagenomic profiling. Our results show that Taxor outperforms competing short- and long-read tools regarding precision while having a similar recall. Most notably, Taxor reduces the memory requirements and index size by more than 50% and is among the fastest tools regarding query times. This enables real-time metagenomics analysis with large reference databases on a small laptop in the field. Taxor is available at https://gitlab.com/dacs-hpi/taxor.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC