Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.09.527851v1?rss=1
Authors: Neupane, A., Chariker, J. H., Rouchka, E. C.
Abstract: G quadruplexes are short secondary DNA structures located throughout genomic DNA and transcribed RNA. though G4 structures have been shown to form in vivo, no current search tools are known to exist to examine these structures based on previously identified G quadruplexes, much less filter them based on similar sequence, structure, and thermodynamic properties. We present a framework for clustering G quadruplex sequences into families using the CD-HIT, MeShClust and DNACLUST methods along with a combination of Starcode and BLAST. Utilizing this framework to filter and annotate clusters, 95 families of G quadruplex sequences were identified within the human genome. Profiles for each family were created using hidden Markov models to allow for identification of additional family members and generate homology probability scores. The thermodynamic folding energy properties, functional annotation of genes associated with the sequences, scores from different prediction algorithms and transcription factor binding and motif to the G4 region for the sequences within a family were used to annotate and compare the diversity within and across clusters. The resulting set of G quadruplex families can be used to further understand how different regions of the genome are regulated by factors targeting specific structures common to members of a specific cluster.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC