cover of episode Machine learning models for the prediction of enzyme properties should be tested on proteins not used for model training

Machine learning models for the prediction of enzyme properties should be tested on proteins not used for model training

2023/2/7
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.06.526991v1?rss=1

Authors: Kroll, A., Lercher, M. J.

Abstract: The turnover number kcat quantifies the catalytic efficiency of enzymes. As experimental kcat estimates are expensive and time consuming, it is desirable to develop computational pipelines able to predict turnover numbers of arbitrary enzymes from easily accessible features. Advances in deep learning have now put such predictions into reach. In a recent publication, Li et al. described DLKcat, a general deep learning model for kcat predictions. The authors state that their approach facilitates "high-throughput kcat prediction for metabolic enzymes from any organism merely from substrate structures and protein sequences." Furthermore, they claim that "DLKcat can capture kcat changes for mutated enzymes". Here, we show that DLKcat predictions are accurate only for enzymes that are highly similar to proteins used for training, and become positively misleading for enzymes without close homologs in the training data. We further show that DLKcat's mutant predictions - all of which were made for enzymes highly similar to training data - are much less accurate than implied by the DLKcat publication, capturing only 3% of the experimentally observed variation across mutants not included in the training data.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC