cover of episode MIDAS: a fast and simple simulator for realistic microbiome data

MIDAS: a fast and simple simulator for realistic microbiome data

2023/3/25
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.23.533996v1?rss=1

Authors: He, M., Satten, G., Zhao, N.

Abstract: Advances in sequencing technology has led to the discovery of associations between the human microbiota and many diseases, conditions and traits. With the increasing availability of microbiome data, many statistical methods have been developed for studying these associations. The growing number of newly developed methods highlights the need for simple, rapid and reliable methods to simulate realistic microbiome data, which is essential for validating and evaluating the performance of these methods. However, generating realistic microbiome data is challenging due to the complex nature of microbiome data, which feature correlation between taxa, sparsity, overdispersion, and compositionality. Current methods for simulating microbiome data are deficient in their ability to capture these important features of microbiome data, or can require exorbitant computational time. We develop MIDAS (MIcrobiome DAta Simulator), a fast and simple approach for simulating realistic microbiome data that reproduces the distributional and correlation structure of a template microbiome dataset. We demonstrate improved performance of MIDAS relative to other existing methods using gut and vaginal data. MIDAS has three major advantages. First, MIDAS performs better in reproducing the distributional features of real data compared to other methods at both presence-absence level and relative-abundance level. MIDAS-simulated data are more similar to the template data than competing methods, as quantified using a variety of measures. Second, MIDAS makes no distributional assumption for the relative abundances, and thus can easily accommodate complex distributional features in real data. Third, MIDAS is computationally efficient and can be used to simulate large microbiome datasets.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC