cover of episode GSearch: Ultra-Fast and Scalable Microbial Genome Search by combining Kmer Hashing with Hierarchical Navigable Small World Graphs

GSearch: Ultra-Fast and Scalable Microbial Genome Search by combining Kmer Hashing with Hierarchical Navigable Small World Graphs

2022/10/21
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.10.21.513218v1?rss=1

Authors: Zhao, J., Pierre-both, J., Rodriguez-R, L. M., Konstantinidis, K. T.

Abstract: Genome search and/or classification is a key step in microbiome studies and has become more challenging due to the increasing number of available (reference) genomes in recent years and the fact that traditional methods do not scale well with larger databases. By combining a kmer hashing-based genomic distance metric (Probminhash) with a graph based nearest neighbor search (NNS) algorithm (called Hierarchical Navigable Small World Graphs), we developed a new program, GSearch, that is at least ten times faster than alternative tools for the same purposes while maintaining high accuracy. GSearch can identify/classify eight thousand query genomes against all available microbial and viral genomic species within several minutes on a personal laptop, using only ~6GB of memory. Further, GSearch can scale well with millions of database genomes based on a database splitting strategy. Therefore, GSearch solves a major bottleneck in current and future microbiome studies that require genome search and/or classification.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC