Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.07.04.547623v1?rss=1
Authors: Capraz, T., Huber, W.
Abstract: Motivation: A fundamental step in many analyses of high-dimensional data is dimension reduction. Two basic approaches are introduction of new, synthetic coordinates, and selection of extant features. Advantages of the latter include interpretability, simplicity, transferability and modularity. A common criterion for unsupervised feature selection is variance or dynamic range. However, in practice it can occur that high-variance features are noisy, that important features have low variance, or that variances are simply not comparable across features because they are measured in unrelated numeric scales or physical units. Moreover, users may want to include measures of signal-to-noise ratio and non-redundancy into feature selection. Results: Here, we introduce the RNR algorithm, which selects features based on (i) the reproducibility of their signal across replicates and (ii) their non-redundancy, measured by linear independence. It takes as input a typically large set of features measured on a collection of objects with two or more replicates per object. It returns an ordered list of features i1, i2, ... , ik, where feature i1 is the one with the highest reproducibility across replicates, i2 that with the highest reproducibility across replicates after projecting out the dimension spanned by i1 , and so on. Applications to microscopy based imaging of cells and proteomics experiments highlight benefits of the approach. Availability: The RNR method is implemented in the R package FeatSeekR and is available via Bioconductor (Huber et al., 2015) under the GPL-3 open source license. Contact: [email protected]
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC