cover of episode Feature selection followed by a residuals-based normalization simplifies and improves single-cell gene expression analysis

Feature selection followed by a residuals-based normalization simplifies and improves single-cell gene expression analysis

2023/3/3
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.02.530891v1?rss=1

Authors: Singh, A., Khiabanian, H.

Abstract: A critical step in the computational analysis of single-cell RNA-sequencing (scRNA-seq) counts data is that of normalization, wherein the goal is to reduce biases introduced due to technical sources that obscure the underlying biological variation of interest. This is typically accomplished by scaling the observed counts so as to reduce the differences in sampling depths between the cells. The scaled counts are then transformed to stabilize the variances of genes across different expression levels. In the standard scRNA-seq workflows, this is followed by a feature selection step which identifies highly variable genes that capture most of the biologically meaningful variation across the cells. In this article, we propose a simple feature selection method and show that we can perform feature selection based on the observed counts before normalization. Importantly, with our feature selection method we identify both variable as well as stable genes. The latter can be used to estimate cell-specific size factors since the variation in their counts are more likely to reflect the unwanted biases. The selection of features prior to normalization also means that we no longer have to rely on the variances of the transformed counts to identify variable genes. This makes it possible to ensure effective variance stabilization during normalization. Keeping this in mind, we propose a residuals-based normalization method that reduces the impact of sampling depth differences and simultaneously ensures variance stabilization by utilizing a monotonic non-linear transformation. We demonstrate significant improvements in downstream clustering analyses enabled by the application of our feature selection and normalization methods on real biological truth-known as well as simulated counts data sets. Based on these results, we make the case for a revised scRNA-seq analysis workflow in which we first perform feature selection and then perform normalization using the residuals-based approach. The novel workflow, with the proposed feature selection and normalization methods, has been implemented in an R package called Piccolo.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC