Evaluating Prompt Tuning for Conditional Protein Sequence Generation

2023/3/1

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.28.530492v1?rss=1

Authors: Nathansen, A., Klein, K., Renard, B. Y., Nowicka, M., Bartoszewicz, J. M.

Abstract: Text generation models originally developed for natural language processing have proven to be successful in generating protein sequences. These models are often finetuned for improved performance on more specific tasks, such as generation of proteins from families unseen in training. Considering the high computational cost of finetuning separate models for each downstream task, prompt tuning has been proposed as an alternative. However, no openly available implementation of this approach compatible with protein language models has been previously published. Thus, we adapt an open-source codebase designed for NLP models to build a pipeline for prompt tuning on protein sequence data, supporting the protein language models ProtGPT2 and RITA. We evaluate our implementation by learning prompts for conditional sampling of sequences belonging to a specific protein family. This results in improved performance compared to the base model. However, in the presented use case, we observe discrepancies between text-based evaluation and predicted biological properties of the generated sequences, identifying open problems for principled assessment of protein sequence generation quality.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC

Evaluating Prompt Tuning for Conditional Protein Sequence Generation 01:28 Share

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Evaluating Prompt Tuning for Conditional Protein Sequence Generation