cover of episode TemStaPro: protein thermostability prediction using sequence representations from protein language models

TemStaPro: protein thermostability prediction using sequence representations from protein language models

2023/3/28
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.27.534365v1?rss=1

Authors: Pudziuvelyte, I., Olechnovic, K., Godliauskaite, E., Sermokas, K., Urbaitis, T., Gasiunas, G., Kazlauskas, D.

Abstract: Reliable prediction of protein thermostability from its sequence is valuable for both academic and industrial research. This prediction problem can be tackled using machine learning and by taking advantage of the recent blossoming of deep learning methods for sequence analysis. We propose applying the principle of transfer learning to predict protein thermostability using embeddings generated by protein language models (pLMs) from an input protein sequence. We used large pLMs that were pre-trained on hundreds of millions of known sequences. The embeddings from such models allowed us to efficiently train and validate a high-performing prediction method using over 2 million sequences that we collected from organisms with annotated growth temperatures. Our method, TemStaPro (Temperatures of Stability for Proteins), was used to predict thermostability of CRISPR-Cas Class II effector proteins (C2EPs). Predictions indicated sharp differences among groups of C2EPs in terms of thermostability and were largely in tune with previously published and our newly obtained experimental data. TemStaPro software is freely available from https://github.com/ievapudz/TemStaPro.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC