Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.04.03.535375v1?rss=1
Authors: Liu, W., Wang, Z., You, R., Xie, C., Wei, H., Xiong, Y., Yang, J., Zhu, S.
Abstract: Homologous protein search is one of the most commonly used methods for protein annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here we propose PLMSearch (Pretien Language Model), a homolo- gous protein search method with only sequences as input. By using deep representations from a pre-trained protein language model to predict similarity, PLMSearch is able to capture the remote homology information hidden behind the sequences. Extensive experiment results show that PLMSearch outperforms other sequence search methods under different scenarios, and is compara- ble to the state-of-the-art structure search methods. In particular, like structure search methods, PLMSearch can recall most remote homology pairs, whose sequence similarity is low but share similar structures. PLMSearch is freely available at http://issubmission.sjtu.edu.cn/PLMSearch.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC