cover of episode GraphPart: Homology partitioning for biological sequence analysis

GraphPart: Homology partitioning for biological sequence analysis

2023/4/17
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.04.14.536886v1?rss=1

Authors: Teufel, F., Gislason, M. H., Almagro Armenteros, J. J., Johansen, A. R., Winther, O., Nielsen, H.

Abstract: When splitting biological sequence data for the development and testing of predictive models, it is necessary to avoid too closely related pairs of sequences ending up in different partitions. If this is ignored, performance estimates of prediction methods will tend to be exaggerated. Several algorithms have been proposed for homology reduction, where sequences are removed until no too closely related pairs remain. We present GraphPart, an algorithm for homology partitioning, where as many sequences as possible are kept in the dataset, but partitions are defined such that closely related sequences always end up in the same partition. Evaluation of GraphPart on Protein, DNA and RNA datasets shows that it is capable of retaining a larger number of sequences per dataset, while providing homology separation quality on par with reduction approaches.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC