Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.07.13.548862v1?rss=1
Authors: Song, Y., Yuan, Q., Zhao, H., Yang, Y.
Abstract: The interactions between proteins and nucleic acids play crucial roles in various biological activities and the design of new drugs. How to identify the nucleic-acid-binding sites accurately remains a challenging task. Currently, the existing sequence-based methods have limited predictive performance due to only considering contextual features of the sequential neighbors, while structure-based methods are not suitable for proteins mostly without known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breakthrough of ESMFold for fast protein structure prediction, we propose GLMSite, an accurate predictor for identifying DNA and RNA-binding sites through geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The network was trained through the geometric vector perceptron, and the learned geometric embeddings are input into a common network to learn common binding characteristics, followed by two fully connected layers to learn specific binding patterns for DNA and RNA, respectively. Through comprehensive tests on DNA/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for identifying nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, source codes, together with trained models are available at https://github.com/biomed-AI/nucleic-acid-binding.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC