cover of episode Extremely-fast construction and querying of compacted and colored de Bruijn graphs with GGCAT

Extremely-fast construction and querying of compacted and colored de Bruijn graphs with GGCAT

2022/10/25
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.10.24.513174v1?rss=1

Authors: Cracco, A., Tomescu, A. I.

Abstract: Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted graphs Bruijn graphs are a variant built on a collection of sequences, and associate to each k-mer the sequences in which it appears. Here we present GGCAT, a tool for constructing both types of graphs. Compared to Cuttlefish 2 (Genome Biology, 2022), the state-of-the-art for constructing compacted de Bruijn graphs, GGCAT has a speedup of up to 3.4x for k = 63 and up to 20.8x for k = 255. Compared to Bifrost (Genome Biology,2020), the state-of-the-art for constructing the colored variant, GGCAT achieves a speedup of up to 12.6x for k = 27. GGCAT is up to 480x faster than BiFrost for batch sequence queries on colored graphs. GGCAT is based on a new approach merging the k-mer counting step with the unitig construction step, and on many practical optimizations. GGCAT is implemented in Rust and is freely available at https://github.com/algbio/ggcat This work was partially funded by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 851093, SAFEBIO), and partially by the Academy of Finland (grants No. 322595, 352821, 346968).

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC