cover of episode Dividing out quantification uncertainty allows efficient assessment of differential transcript expression

Dividing out quantification uncertainty allows efficient assessment of differential transcript expression

2023/4/4
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.04.02.535231v1?rss=1

Authors: Baldoni, P. L., Chen, Y., Hediyeh-zadeh, S., Liao, Y., Dong, X., Ritchie, M. E., Shi, W., Smyth, G. K.

Abstract: A major challenge in the analysis of RNA-seq data at the transcript-level is accounting for the variability introduced during quantification of RNA sequencing reads. This variability is due to the high levels of sequence similarity among transcripts annotated to the same genomic locus and the mapping ambiguity resulting from the assignment of sequence reads to such transcripts. The quantification uncertainty associated with transcript-level estimated counts is intractable to measure analytically but represents an extra source of variation that seriously compromises differential transcript expression (DTE) analyses if standard statistical methods developed for gene-level analyses are used. Bootstrap counts, as provided by popular RNA-seq quantification tools, allow one to estimate the quantification uncertainty and account for such an effect in DTE analyses. We present catchSalmon and catchKallisto, two functions included in the R/Bioconductor package edgeR, that estimate the transcript-level quantification uncertainty, here termed mapping ambiguity overdispersion, using bootstrap counts. We discuss how the mapping ambiguity overdispersion can be effectively removed from the data in transcript-level analyses via count scaling, an approach that reduces the size of the estimated counts obtained from quantification tools to effective count sizes that reflect their true precision. The presented count scaling approach allows users to perform efficient DTE analyses within the efficient edgeR framework. A comprehensive simulation study and a DTE analysis of human lung adenocarcinoma cell lines are presented to illustrate the benefits of accounting for the mapping ambiguity overdispersion in transcript-level RNA-seq data analyses.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC