Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.04.527099v1?rss=1
Authors: Zhou, X., Chen, G., Ye, J., Wang, E., Zhang, J., Mao, C., Li, Z., Hao, J., Huang, X., Tang, J., Heng, P. A.
Abstract: Inverse Protein Folding (IPF) is an important task of protein design, which aims to design sequences compatible with a given backbone structure. Despite the prosperous development of algorithms for this task, existing methods tend to leverage limited and noisy residue environment when generating sequences. In this paper, we develop an iterative sequence refinement pipeline, which can refine the sequence generated by existing sequence design models. It selects and retains reliable predictions based on the model's confidence in predicted distributions, and decodes the residue type based on a partially visible environment. The proposed scheme can consistently improve the performance of a number of IPF models on several sequence design benchmarks, and increase sequence recovery of the SOTA model by up to 10%. We finally show that the proposed model can be applied to redesign Transposon-associated transposase B. 8 variants exhibit improved gene editing activity among the 20 variants we proposed. Our code and a demo of the refinement pipeline are provided in the online colab.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC