cover of episode The impact of FASTQ and alignment read order on structural variation calling from long-read sequencing data

The impact of FASTQ and alignment read order on structural variation calling from long-read sequencing data

2023/3/29
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.27.534439v1?rss=1

Authors: Lesack, K., Wasmuth, J.

Abstract: Background: Structural variation (SV) calling from DNA sequencing data has been challenging due to several factors, such as the ambiguity of short-read alignments, multiple complex SVs in the same genomic region, and the lack of "truth" datasets for benchmarking. Additionally, caller choice, parameter settings, and alignment method are known to affect SV calling. However, the impact of FASTQ read order on SV calling has not been explored for long-read data. Results: In this study, we used PacBio DNA sequencing data from 15 Caenorhabditis elegans isolates to evaluate the dependence of different SV callers on FASTQ read order. Comparisons of variant call format (VCF) files generated from the original and permutated FASTQ files demonstrated that the order of input data had a large impact on SV prediction, particularly for pbsv. The overall differences were lowest for Sniffles, regardless of the aligner used. The type of variant most affected by read order varied by caller. For pbsv, most differences occurred for de-letions and duplications, while for Sniffles, permutating the read order had a stronger impact on insertions. For SVIM, inversions and deletions accounted for most differences. Conclusion: The results of this study highlight the dependence of SV calling on the order of reads encoded in FASTQ files, which has not been recognized in long-read approaches. These findings have implications for the replication of SV studies and the development of consistent SV calling protocols. Our study suggests that researchers should pay attention to the order of reads when analyzing long-read sequencing data for SV calling.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC