cover of episode Benchmarking Variational AutoEncoders on cancer transcriptomics data

Benchmarking Variational AutoEncoders on cancer transcriptomics data

2023/2/10
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.09.527832v1?rss=1

Authors: elTager, M., Abdelaal, T., Charrout, M., Mahfouz, A., Reinders, M., Makrodimitris, S.

Abstract: Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream task of cluster agreement with cancer subtypes. We studied the effect of the latent space dimensionality, learning rate, optimizer and initialization on the quality of subsequent clustering of the TCGA samples. We found {beta}-TCVAE and DIP-VAE to have a sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we correlated the different representations with various data characteristics such as age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics even for models specifically designed for disentanglement

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC