Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.09.523362v1?rss=1
Authors: Perera, R., Perera, R.
Abstract: The breast cancer mortality rate is high in developing countries as the early detection of cancer is deficient in patients. The identification of the genes that drive cancer is one of the major approaches in the early detection of breast cancer. Several computational tools have been developed to predict the cancer driver genes. However, there is no gold-standard method for identifying breast cancer driver genes. Therefore, this study aims to develop a model to predict high-confidence breast cancer diver genes using already-developed computational tools and already-produced breast cancer data. Primary breast cancer data were retrieved from the Cancer Genome Atlas Program (TCGA). Here we use twenty-seven different gene prediction tools that calculate each genes effect and variant in the primary gene dataset. The primary dataset feeds as the input for each tool. The results retrieved from each tool are recorded as the secondary dataset. The latest breast cancer driver gene set was retrieved from DriverDBv3 and included as the target attribute. Training and testing subsets were selected using k-fold cross-validation from the secondary dataset. Attributes from the secondary dataset were ranked according to their correlation with breast cancer driver genes. The ranked data were trained using different supervised machine-learning algorithms. The attributes and the learning algorithm which produced the highest classification accuracy were selected to build the new model, BReast cancer Driver (BRDriver). The new model BRDriver achieved 0.999 area under the curve and 0.999 classification accuracy on breast invasive carcinoma data. TP53 was the most predicted breast cancer drive gene (n=246) predicted by BRDriver. Interestingly, our BRDriver model predicted CDKN2A and NFE2L2 genes as new breast cancer driver genes. Further in vivo and in vitro studies are required to determine whether these genes are indeed cancer-driver genes. Many computational tools developed to identify the cancer driver genes and variants. A few methodologies were developed to combine these tools and increase efficiency for detecting breast cancer driver genes. Our BRDriver overcomes the difficulties and produces high-confidence breast cancer driver genes.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC