Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.04.16.537101v1?rss=1
Authors: Shiarli Hossein Zade, R., Abeel, T.
Abstract: Determining accurate genotypes is important for associating phenotypes to genotypes. De novo genome assembly is a critical step to determine the complete genotype for species for which no reference exists yet. The main challenge of de novo eukaryote genome assembly, particularly plant genomes, are repetitive DNA sequences within their genomes. The introduction of third generation sequencing and corresponding long reads has promised to resolve repeat-related problems. While there have been notable improvements, reads originating from these repeats are still introducing errors because they introduce false overlaps in the assembly graph. This study focuses on analyzing the effect of repeats on de novo assembly and improving performance of existing de novo assembly algorithms by removing repeat-induced overlaps. First, we show the possible improvements in de novo assembly with removing repeat-induced overlaps. Then we propose several methods for detecting and removing repeat-induced overlaps and evaluate their performance on several simulated datasets.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC