Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.27.534314v1?rss=1
Authors: Wirnsberger, G., Pritisanac, I., Oberdorfer, G., Gruber, K.
Abstract: Proteins get used in a wide variety of applications. For some applications, a specific protein property needs to be improved. Deep mutational scanning data can then be used to train a neural network, which can predict the effect of variants that have not been experimentally determined. Many approaches use the protein's sequence in one form or another as data representation for their prediction method. Since proteins are not one-dimensional strings but feature a complex 3D structure containing interacting residues, using the protein's structure as input brings the advantage of already encoding the interactions rather than having to learn them. Here we present a new representation for describing protein structures using stacked 2D contact maps. The contact maps represent residue interactions, their evolutionary conservation, and changes in interactions that occur due to mutations. Additionally, we provide different methods to improve the performance when training neural networks on small (50 - 2000 data points big) deep mutational scanning data sets. To test this approach, we trained three neural network architectures on three deep mutational scanning datasets and assessed their performance compared to a neural network trained on protein sequences alone. We demonstrate the utility of this representation for training neural networks to predict the outcome of deep mutational scanning experiments. The use of structural representation, as well as data augmentation and pre-training, substantially reduced the training data requirements and led to improved performance on smaller data sets and comparable performance when bigger data sets are available compared to the state-of-the-art sequence convolutional neural network, which demonstrates the power of our approach. This work could pave the way to expanded usage of deep mutational scanning by substantially reducing experimental requirements and opens the door for smaller facilities that might not have access to high throughput screening.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC