cover of episode Protein language model powers accurate and fast sequence search for remote homology

Protein language model powers accurate and fast sequence search for remote homology

2023/4/5
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.04.03.535375v1?rss=1

Authors: Liu, W., Wang, Z., You, R., Xie, C., Wei, H., Xiong, Y., Yang, J., Zhu, S.

Abstract: Homologous protein search is one of the most commonly used methods for protein annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here we propose PLMSearch (Pretien Language Model), a homolo- gous protein search method with only sequences as input. By using deep representations from a pre-trained protein language model to predict similarity, PLMSearch is able to capture the remote homology information hidden behind the sequences. Extensive experiment results show that PLMSearch outperforms other sequence search methods under different scenarios, and is compara- ble to the state-of-the-art structure search methods. In particular, like structure search methods, PLMSearch can recall most remote homology pairs, whose sequence similarity is low but share similar structures. PLMSearch is freely available at http://issubmission.sjtu.edu.cn/PLMSearch.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC