cover of episode ViReaDB: A user-friendly database for compactly storing viral sequence data and rapidly computing consensus genome sequences

ViReaDB: A user-friendly database for compactly storing viral sequence data and rapidly computing consensus genome sequences

2022/10/24
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.10.21.513318v1?rss=1

Authors: Moshiri, N.

Abstract: Motivation: In viral molecular epidemiology, reconstruction of consensus genomes from sequence data is critical for tracking mutations and variants of concern. However, storage of the raw sequence data can become prohibitively large, and computing consensus genome from sequence data can be slow and requires bioinformatics expertise. Results: ViReaDB is a user-friendly database system for compactly storing viral sequence data and rapidly computing consensus genome sequences. From a dataset of 1 million trimmed mapped SARS-CoV-2 reads, it is able to compute the base counts and the consensus genome in 16 minutes, store the reads alongside the base counts and consensus in 50 MB, and optionally store just the base counts and consensus (without the reads) in 300 KB. Availability: ViReaDB is freely available on PyPI (https://pypi.org/project/vireadb) and on GitHub (https://github.com/niemasd/ViReaDB) as an open-source Python software project.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC