Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.15.528695v1?rss=1
Authors: Toonsi, S., Kafkas, S., Hoehndorf, R.
Abstract: Motivation: Concept recognition in biomedical text is an important yet challenging task. The two main approaches to recognize concepts in text are dictionary-based approaches and supervised machine learning approaches. While dictionary-based approaches fail in recognising new concepts and variations of existing concepts, supervised methods require sufficiently large annotated datasets which are expensive to obtain. Methods based on distant supervision have been developed to use machine learning without large annotated corpora. However, for biomedical concept recognition, these approaches do not yet exploit the context in which a concept occurs in literature, and they do not make use of prior knowledge about dependencies between concepts. Results: We developed BORD, a Biomedical Ontology-based method for concept Recognition using Distant supervision. BORD utilises context from corpora which are lexically annotated using labels and synonyms from the classes of a biomedical ontology for model training. Furthermore, BORD utilises the ontology hierarchy for normalising the recognised mentions to their concept identifiers. We show how our method improves the performance of state of the art methods for recognising disease and phenotype concepts in biomedical literature. Our method is generic, does not require manually annotated corpora, and is robust to identify mentions of ontology classes in text. Moreover, to the best of our knowledge, this is the first approach utilising the ontology hierarchy for concept recognition. Availability: BORD is publicly available from https://github.com/bio-ontology-research-group/BORD
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC