Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.06.26.546495v1?rss=1
Authors: Vogl, C., Karapetiants, M., Yildirim, B., Kjartansdottir, H., Kosiol, C., Bergman, J., Majka, M., Mikula, L. C.
Abstract: Motivation: Genomes are inherently inhomogeneous, with features such as base composition and gene density varying along chromosomes. Biological analyses often aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential obser- vations along chromosomes are not independent, it is unsurprising that autocor- relation patterns have been observed e.g., in the base composition of hominids. In this article, we develop a class of Hidden Markov models (HMMs) called oHMMed (ordered HMM with emission densities): They identify the number of comparably homogeneous regions within observed sequences with autocorre- lation. These are modelled as discrete hidden states; the observed data points are then realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is la- belled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are then inferred. Results: We apply oHMMed to the proportion of G and C bases (modelled as a mixture of normal distributions) and the number of genes (modelled as a mix- ture of poisson-gamma distributions) in windows along the human, mouse, and fruit fly genomes. This results in a partitioning into statistically distinguish- able regions according to these two features for each species, and facilitates the quantification of the positive correlation between them. Availability and Implementation: The algorithms are implemented in the R package oHMMed, and details are available on GitHub.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC