vcfdist: Accurately benchmarking phased small variant calls in human genomes

2023/3/12

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.10.532078v1?rss=1

Authors: Dunn, T., Narayanasamy, S.

Abstract: Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool "vcfdist" and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased "Truth Challenge V2" submissions and show that vcfdist improves measured (SNP, INDEL) performance consistency across variant representations from R2 = (0.14542, 0.97243) for vcfeval to (0.99999, 0.99996) for vcfdist.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC

vcfdist: Accurately benchmarking phased small variant calls in human genomes 01:34 Share

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

vcfdist: Accurately benchmarking phased small variant calls in human genomes