Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.01.530618v1?rss=1
Authors: Wang, T., Zhang, Y., Wang, H., Zheng, Q., Yang, J., Zhang, T., Sun, G., Liu, W., Yin, L., He, X., You, R., Wang, C., Liu, Z., Liu, Z., Wang, J., Jin, X., He, Z.
Abstract: Background: Whole genome sequencing (WGS) is becoming increasingly prevalent for molecular diagnosis, staging and prognosis because of its declining costs and the ability to detect nearly all genes associated with a patient's disease. The currently widely accepted variant calling pipeline, GATK, is limited in terms of its computational speed and efficiency, which cannot meet the growing analysis needs. Methods: In this study, we propose a fast and accurate DNASeq variant calling workflow that is purely composed of tools from LUSH toolkit. The LUSH pipeline is highly optimized for the WGS pipeline based on SOAPnuke, BWA and GATK which can be deployed on any general-purpose CPU-based computing system. We validated the accuracy, speed and scalability of the LUSH pipeline on several standard WGS datasets. Results: Our test results show that the LUSH pipeline and the GATK pipeline are highly consistent in terms of accuracy, achieving over 99% precision and recall on NA12878. For speed, the LUSH pipeline completes 30x WGS data in 1.6 hours, which is about 17x faster than the GATK pipeline. From BAM to VCF, LUSH_HC even takes only 15 minutes, about 76x faster than GATK. Moreover, the LUSH pipeline shows favorable scalability in terms of thread and sequencing depth. Conclusion: The LUSH pipeline provides far superior computational speed to GATK while maintaining a high level of accuracy comparable to that of GATK, which greatly facilitates bedside analysis of acute patients, large-scale cohort data analysis, and variant calling in crop breeding programs.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC