cover of episode Prioritizing Complex Disease Genes from Heterogeneous Public Databases

Prioritizing Complex Disease Genes from Heterogeneous Public Databases

2023/2/10
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.09.527562v1?rss=1

Authors: Gong, E. L., Chen, J. Y.

Abstract: Background: Complex human diseases are defined not only by sophisticated patterns of genetic variants/mutations upstream but also by many interplaying genes, RNAs, and proteins downstream. Analyzing multiple genomic and functional genomic data types to determine a short list of genes or molecules of interest is a common task called ''gene prioritization'' in biology. There are many statistical, biological, and bioinformatic methods developed to perform gene prioritization tasks. However, little research has been conducted to examine the relationships among the technique used, merged/separate use of each data modality, the gene list's network/pathway context, and various gene ranking/expansions. Methods: We introduce a new analytical framework called ''Gene Ranking and Iterative Prioritization based on Pathways'' (GRIPP) to prioritize genes derived from different modalities. Multiple data sources, such as CBioPortal, PAGER, and COSMIC were used to compile the initial gene list. We used the PAGER software to expand the gene list based on biological pathways and the BEERE software to construct protein-protein interaction networks that include the gene list to rank order genes. We produced a final gene list for each data modality iteratively from an initial draft gene list, using glioblastoma multiform (GBM) as a case study. Conclusion: We demonstrated that GBM gene lists obtained from three modalities (differential gene expressions, gene mutations, and copy number alterations) and several data sources could be iteratively expanded and ranked using GRIPP. While integrating various modalities of data can be useful to generate an integrated ranked gene list related to any specific disease, the integration may also decrease the overall significance of ranked genes derived from specific data modalities. Therefore, we recommend carefully sorting and integrating gene lists according to each modality, such as gene mutations, epigenetic controls, or differential expressions, to procure modality-specific biological insights into the prioritized genes.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC