Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.29.534691v1?rss=1
Authors: Wang, L., van der Toorn, W., Bohn, P., Hölzer, M., Smyth, R., von Kleist, M.
Abstract: Motivation: Direct RNA sequencing (dRNA-seq) on the Oxford Nanopore Technology platforms has become increasingly popular in recent years, with promising outlook to transform the field of epitranscriptomics. Reads produced from dRNA-seq can cover up to full-length gene transcripts while containing decipherable information about RNA base modifications and poly-A tail lengths. Although many studies have been published exploring and expanding the potential of dRNA-seq, the sequencing accuracy and error patterns remain understudied and less characterized compared to DNA sequencing. Results: We evaluated the sequencing accuracy and characterized the systematic error patterns of dRNA-seq on public datasets including native RNA samples from diverse organisms, as well as synthetic in vitro transcribed RNAs. The median read accuracy is about 90% for most organisms, although some species are more challenging. Deletions account for the majority of errors and are twice as common as mismatches or insertions. Apart from the well-known homopolymer sequencing errors, there are systematic biases across all organisms at both single nucleotide and 2-mer motif level. In general, cytosines and uracils are more likely to be erroneous than guanines and adenines. Moreover, the systematic errors are found to be strongly dependent on the local sequence contexts, with complex interactions between adjacent positions. We further evaluated the accuracy of sequencing homopolymers, read quality scores as an estimate of error rates, and the consequences of failing to detect the DNA adaptors. Lastly, we discuss the relevance of such error patterns for the downstream applications of dRNA-seq data, such as transcript identification and base modification detection.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC