Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.07.17.549334v1?rss=1
Authors: Tanaka, Y., Kajitani, R., Itoh, T.
Abstract: Repeat sequences in the genome can be classified into interspersed and tandem repeats, both of which are important for understanding genome evolution and important traits such as disease. They are also noteworthy as regions of high frequency of genome rearrangement in somatic cells and high inter-individual diversity. Existing repeat detection tools have limitations in that they targets only one of the two types and/or require reference sequences. In this study, we developed a novel tool: cycle_finder, which constructs a graph structure (de Bruijn graph) from low-cost short-read data and constructs units of both types of repeats. The tool can detect cycles with branching and corresponding tandem repeats, and can also construct interspersed repeats by exploring non-cycle subgraphs. Furthermore, it can estimate sequences with large copy-number differences by using two samples as input. Benchmarking with simulations and actual data from the human genome showed that this tool had superior recall and precision values compared to existing methods. In a test on the roundworm data, in which large-scale deletions occur in somatic cells, the tool succeeded in detecting deletion sequences reported in previous studies. This tool is expected to enable low-cost analysis of repeat sequences that were previously difficult to construct.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC