Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.23.533712v1?rss=1
Authors: Ren, T., Chen, C., Danilov, A. V., Liu, S., Guan, X., Du, S., Wu, X., Sherman, M. H., Spellman, P. T., Coussens, L. M., Adey, A. C., Mills, G. B., Wu, L.-Y., Xia, Z.
Abstract: Accurately identifying phenotype-relevant cell subsets from heterogeneous cell populations is crucial for delineating the underlying mechanisms driving biological or clinical phenotypes. Here, by deploying a learning with rejection strategy, we developed a novel supervised learning framework called PENCIL to identify subpopulations associated with categorical or continuous phenotypes from single-cell data. By embedding a feature selection function into this flexible framework, for the first time, we were able to simultaneously select informative features and identify cell subpopulations, enabling accurate identification of phenotypic subpopulations otherwise missed by methods incapable of concurrent gene selection. Furthermore, the regression mode of PENCIL presents a novel ability for supervised phenotypic trajectory learning of subpopulations from single-cell data. We conducted comprehensive simulations to evaluate PENCIL's versatility in simultaneous gene selection, subpopulation identification and phenotypic trajectory prediction. PENCIL is fast and scalable to analyze one million cells within one hour. Using the classification mode, PENCIL detected T-cell subpopulations associated with melanoma immunotherapy outcomes. Moreover, when applied to scRNA-seq of a mantle cell lymphoma patient with drug treatment across multiple time points, the regression mode of PENCIL revealed a transcriptional treatment response trajectory. Collectively, our work introduces a scalable and flexible infrastructure to accurately identify phenotype-associated subpopulations from single-cell data.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC