cover of episode FM3VCF: A Software Library for Accelerating the Loading of Large VCF Files in Genotype Data Analyses

FM3VCF: A Software Library for Accelerating the Loading of Large VCF Files in Genotype Data Analyses

2023/6/27
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.06.25.546413v1?rss=1

Authors: Zuo, Z., Li, Q., Li, Z., Huang, M., Tang, Y.

Abstract: Background: The increasing size of genotype data has led to the loading of VCF files becoming a computational bottleneck in various analyses, including imputation and genome-wide association studies (GWAS). To address this issue, we developed a software library, FM3VCF (fast M3VCF), that utilizes multiple CPU threads to accelerate this process. Findings: FM3VCF can convert VCF files into the exclusive data format of MINIMAC4[1], M3VCF[1], and efficiently read and parse data from VCF files. In comparison to m3vcftools[1], FM3VCF is approximately 20 times faster for compressing VCF files to M3VCF format. Furthermore, FM3VCF is approximately 3 times faster than HTSlib[2], including decompressing and parsing, for reading compressed VCF files. FM3VCF is written in C and is open-source, available for download from https://github.com/Oliver-111/m3vcf under the MIT/BSD license. Conclusion: FM3VCF is a powerful tool for accelerating the loading of large VCF files in genotype data analyses. By fully utilizing multiple CPU threads, FM3VCF can significantly reduce the computational burden in various genomic analyses.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC