cover of episode MuLan-Methyl - Multiple Transformer-based Language Models for Accurate DNA Methylation Prediction

MuLan-Methyl - Multiple Transformer-based Language Models for Accurate DNA Methylation Prediction

2023/1/4
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.04.522704v1?rss=1

Authors: Zeng, W., Gautam, A., Huson, D. H.

Abstract: Transformer-based language models are successfully used to address massive text-related tasks. DNA methylation is an important epigenetic mechanism and its analysis provides valuable insights into gene regulation and biomarker identification. Several deep learning-based methods have been proposed to identify DNA methylation and each seeks to strike a balance between computational effort and accuracy. Here, we introduce a MuLan-Methyl, a deep learning framework for predicting DNA methylation sites, which is based on multiple (five) popular transformer-based language models. The framework identifies methylation sites for three different types of DNA methylation, namely N6-adenine (6mA), N4-cytosine (4mC), and 5-hydroxymethylcytosine (5hmC). Each of the five employed language models is adapted to the task using the "pre-train and fine-tune" paradigm. Pre-training is performed on a custom corpus consisting of DNA fragments and taxonomy lineages using self-supervised learning. Fine-tuning then aims at predicting the DNA-methylation status of each type. The five models are used to collectively predict the DNA methylation status. We report excellent performance of MuLan-Methyl on a benchmark dataset. Moreover, we show that the model captures characteristic differences between different species that are relevant for methylation. This work demonstrates that language models can be successfully adapted to this domain of application and that joint utilization of different language models improves model performance.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC