Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.30.526193v1?rss=1
Authors: Cheng, L., Yu, T., Aittokallio, T. A., Corander, J., Khalitov, R., Yang, Z.
Abstract: Due to their intrinsic properties, DNA molecules commonly exhibit long-range interactions along a linear sequence representation. Taking this information into account when modeling DNA sequences is therefore important for obtaining more accurate sequence-based inference. Many deep learning methods have recently been developed for this purpose, but they still suffer from two major issues. First, the existing methods can only handle short DNA fragments, thereby losing longer-range interactions. Second, the current methods require massive supervised labeling while missing most order information within the sequences. Consequently, there is a need to develop an efficient deep neural network modeling framework to extract wide contextual information for more accurate sequence-based inference tasks. Our new framework, named Revolution, takes full DNA sequences as input, without any condensation, and can give accurate predictions for DNA sequences up to 10kbp. In variant effect prediction, our method increases the Area Under the Receiver Operating Characteristics (AUROC) by 19.61% on 49 human tissues on average. Revolution is also demonstrated to work on the plant sequences by improving 2.36% AUROC on average for predicting open chromatin regions (OCRs).
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC