Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.07.03.547471v1?rss=1
Authors: Le, D. Q., Nguyen, T. A., Nguyen, T. T., Do, V. H., Nguyen, C. H., Phung, H. T., Ho, T. H., Vo, N. S., Nguyen, T., Nguyen, H. A., Cao, M. D.
Abstract: Pangenome analysis has become indispensable in bacterial genomics due to the high variability of gene content between isolates within a clade. While many computational methods exist for constructing the pangenome from a bacterial genome collection, speed and scalability still remain an issue for the fast-growing genomic collections. Here, we present PanTA, a efficient method to build and analyze pangenomes of bacteria strains. We show that PanTA exhibits an unprecedented 10 times speed up and 2 times more memory efficient over the current state of the art methods. More importantly, PanTA enables the progressive pangenome construction where new samples are added into an existing pangenome without the need of rebuilding the accumulated collection from the scratch. The progressive building of pangenomes can further reduce the memory requirements by half. We demonstrate that PanTA can build the pangenome of the Escherichia coli species from the entire collection of over 28000 high quality genomes collected from the RefSeq database. Crucially, the whole analysis is performed on a modest laptop computer within two days, highlighting the scalability and practicality of PanTA.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC