Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.22.529574v1?rss=1
Authors: Nambiar, A., Forsyth, J. M., Liu, S., Maslov, S.
Abstract: Despite their lack of a rigid structure, intrinsically disordered regions in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate disordered regions of proteins with high accuracy. Most popular tools use evolutionary or biophysical features to make predictions of disordered regions. In this study, we present DR-BERT, a compact protein language model that is first pretrained on a large number of unannotated proteins before being trained to predict disordered regions. Although it does not use any evolutionary or biophysical information, DR-BERT shows a statistically significant improvement when compared to several existing methods on a gold standard dataset. We show that this performance is due to the information learned during pre-training and DR-BERT's ability to use contextual information. A web application for using DR-BERT is available at https://huggingface.co/spaces/nambiar4/DR-BERT and the code to run the model can be found at https://github.com/maslov-group/DR-BERT.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC