Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.12.523792v1?rss=1
Authors: Webel, H., Niu, L., Bach Nielsen, A., Locard-Paulet, M., Mann, M., Juhl Jensen, L., Rasmussen, S.
Abstract: Imputation techniques provide means to replace missing measurements with a value and are used in almost all downstream analysis of mass spectrometry (MS) based proteomics data using label-free quantification (LFQ). Some methods only impute assuming the limit of detection (LOD) was not passed and therefore impute missing values with too low or too high intensities, potentially leading to biased results in downstream statistical analysis. Here we test how self supervised deep learning models can impute missing values in the context of LFQ at different levels: precursors, aggregated peptides or protein groups. We demonstrate how collaborative filtering, denoising autoencoders, and variational autoencoders can be used to reconstruct missing values and can make more relevant features available for downstream analysis compared to current approaches. Additionally, we show that deep learning approaches can model data in its entirety for imputation and offer an approach for controlled evaluation of imputation approaches. We applied our method, proteomics imputation modeling mass spectrometry (PIMMS), to an alcohol-related liver disease (ALD) cohort with blood plasma proteomics data available for 358 individuals. We identified 49 additional proteins (+52.7%) that are significantly differentially abundant across disease stages compared to traditional methods and found that some of these were predictive of ALD progression in machine learning models. We, therefore, suggest the use of deep learning approaches for imputing missing values in MS-based proteomics and provide workflows for these.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC