cover of episode GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistence with Extrinsic Data

GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistence with Extrinsic Data

2023/1/15
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.13.524024v1?rss=1

Authors: Bruna, T., Lomsadze, A., Borodovsky, M.

Abstract: GeneMark-ETP is a computational tool developed to find genes in eukaryotic genomes in consistence with genomic-, transcriptomic- and protein-derived evidence. Earlier developed GeneMark-ET and GeneMark-EP+ addressed more narrow tasks of data integration, working either with fragments of transcripts (short RNA reads) or with homologous protein sequences. Both the transcript- and protein-derived evidence have uneven distribution across a genome. Therefore, GeneMark-ETP finds the genomic loci where extrinsic data helps to identify genes with "high confidence" and then proceeds with the analysis of the regions between the high-confidence genes. If the number of high-confidence genes is sufficiently large, the GHMM model training is done in a single-iteration. Otherwise, several iterations of self-training are necessary prior to making the final prediction of the whole gene complement. Since the difficulty of the gene prediction task ramps up significantly in large plant and animal genomes, the focus of the new development was on large genomes. The GeneMark-ETP performance was favorably compared with the ones of GeneMark-ET, GeneMark-EP+, BRAKER1 and BRAKER2, the methods using a single type of extrinsic evidence. Comparison was also made with TSEBRA, a tool constructing an optimal combination of gene predictions made by BRAKER1 and BRAKER2, thus utilizing both transcript- and protein-derived evidence.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC