Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.29.534803v1?rss=1
Authors: Busia, A., Listgarten, J.
Abstract: Characterizing differences in biological sequences between two conditions using high-throughput sequencing data is a prevalent problem wherein we seek to (i) quantify differences in sequence abundances between conditions, and (ii) build predictive models to estimate such differences for unobserved sequences. A key shortcoming of current approaches is their extremely limited ability to share information across related but non-identical reads. Consequently, they cannot make effective use of sequencing data, nor can they be directly applied in many settings of interest. We introduce model-based enrichment (MBE) to overcome this shortcoming. MBE is based on sound theoretical principles, is easy to implement, and can trivially make use of advances in modern-day machine learning classification architectures or related innovations. We extensively evaluate MBE empirically, both in simulation and on real data. Overall, we find that our new approach improves accuracy compared to current ways of performing such differential analyses.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC