cover of episode VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

2023/3/20
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.16.532942v1?rss=1

Authors: Lin, W., Wells, J., Wang, Z., Orengo, C., Martin, A. C. R.

Abstract: Computational approaches for predicting the pathogenicity of genetic variants have advanced in recent years. These methods enable researchers to determine the possible clinical impact of rare and novel variants. Historically these prediction methods used hand-crafted features based on structural, evolutionary, or physiochemical properties of the variant. In this study we propose a novel framework that leverages the power of pre-trained protein language models to predict variant pathogenicity. We show that our approach VariPred (Variant impact Predictor) outperforms current state-of-the-art methods by using an end-to-end model that only requires the protein sequence as input. By exploiting one of the best performing protein language models (ESM-1b), we established a robust classifier, VariPred, requiring no pre-calculation of structural features or multiple sequence alignments. We compared the performance of VariPred with other representative models including 3Cnet, EVE and ESM variant. VariPred outperformed all these methods on the ClinVar dataset achieving an MCC of 0.751 vs. an MCC of 0.690 for the next closest predictor.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC