cover of episode LISTER: Semi-automatic metadata extraction from annotated experiment documentation in eLabFTW

LISTER: Semi-automatic metadata extraction from annotated experiment documentation in eLabFTW

2023/2/21
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.20.529231v1?rss=1

Authors: Musyaffa, F. A., Rapp, K., Gohlke, H.

Abstract: The availability of scientific methods, code, and data is key for reproducing an experiment. Research data should be made available following the FAIR principle (findable, accessible, interoperable, and reusable). For that, the annotation of research data with metadata is central. However, existing research data management workflows often require that metadata should be created by the corresponding researchers, which takes effort and time. Here, we developed LISTER as a methodological and algorithmic solution to disentangle the creation of metadata from ontology alignment and extract metadata from annotated template-based experiment documentation using minimum effort. We focused on tailoring the integration between existing platforms by using eLabFTW as the electronic lab notebook and adopting the ISA (investigation, study, assay) model as the abstract data model framework; DSpace is used as a data cataloging platform. LISTER consists of three components: customized eLabFTW entries using specific hierarchies, templates, and tags; a 'container' concept in eLabFTW, making metadata of a particular container content extractable along with its underlying, related containers; a Python-based app to enable easy-to-use, semi-automated metadata extraction from eLabFTW entries. LISTER outputs metadata as machine-readable .json and human-readable .csv formats, and MM descriptions in .docx format that could be used in a thesis or manuscript. The metadata can be used as a basis to create or extend ontologies, which, when applied to the published research data, will significantly enhance its value due to a more complete and holistic understanding of the data, but might also enable scientists to identify new connections and insights in their field. We applied LISTER to the fields of computational biophysical chemistry as well as protein biochemistry and molecular biology, and our concept should be extendable to other life science areas.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC