cover of episode L0 segmentation enables data-driven concise representations of diverse epigenomic data

L0 segmentation enables data-driven concise representations of diverse epigenomic data

2023/1/27
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.26.525794v1?rss=1

Authors: Balci, A. T., Chikina, M.

Abstract: Motivation: Epigenetic assays using next-generation sequencing (NGS) have furthered our understanding of the functional genomic regions and the mechanisms of gene regulation. However, a single assay produces billions of data represented by nucleotide resolution signal tracks. The signal strength at a given nucleotide is subject to numerous sources of technical a biological noise and thus conveys limited information about the underlying biological state. In order to draw biological conclusions, data is typically summarized into higher order patterns. Numerous specialized algorithms for summarizing epigenetic signal have been proposed and include methods for peak calling or finding differentially methylated regions. A key unifying principle underlying these approaches is that they all leverage the strong prior that signal must be locally consistent. Results: We propose L0 segmentation as a universal framework for extracting locally coherent signals for diverse epigenetic sources. L0 serves to both compress and smooth the input signal by approximating it as piece-wise constant. We implement a highly scalable L0 segmentation with additional loss functions designed for NGS epigenetic data types including Poisson loss for single tracks and binomial loss for methylation/coverage data. In contrast to the widely used L1 segmentation problem, known as fused lasso, the L0 solution does not induce global attenuation and is able to capture the salient features of the data over a wide range of compression values. Finally, we show that L0 segmentation can be used as an effective prior inside other machine learning models, such as matrix factorization.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC