cover of episode Automated Extraction and Visualization of Metabolic Networks from Biomedical Literature Using a Large Language Model

Automated Extraction and Visualization of Metabolic Networks from Biomedical Literature Using a Large Language Model

2023/6/29
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.06.27.546560v1?rss=1

Authors: Phongwattana, T., Chan, J. H.

Abstract: The rapid growth of biomedical literature presents a significant challenge for researchers to extract and analyze relevant information efficiently. In this study, we explore the application of GPT, the large language model to automate the extraction and visualization of metabolic networks from a corpus of PubMed abstracts. Our objective is to provide a valuable tool for biomedical researchers to explore and understand the intricate metabolic interactions discussed in scientific literature. We begin by splitting a ton of the tokens within the corpus, as the GPT-3.5-Turbo model has a token limit of 4,000 per analysis. Through iterative prompt optimization, we successfully extract a comprehensive list of metabolites, enzymes, and proteins from the abstracts. To validate the accuracy and completeness of the extracted entities, our biomedical data domain experts compare them with the provided abstracts and ensure a fully matched result. Using the extracted entities, we generate a directed graph that represents the metabolic network including 3 types of metabolic events that consist of metabolic consumption, metabolic reaction, and metabolic production. The graph visualization, achieved through Python and NetworkX, offers a clear representation of metabolic pathways, highlighting the relationships between metabolites, enzymes, and proteins. Our approach integrates language models and network analysis, demonstrating the power of combining automated information extraction with sophisticated visualization techniques. The research contributions are twofold. Firstly, we showcase the ability of GPT-3.5-Turbo to automatically extract metabolic entities, streamlining the process of cataloging important components in metabolic research. Secondly, we present the generation and visualization of a directed graph that provides a comprehensive overview of metabolic interactions. This graph serves as a valuable tool for further analysis, comparison with existing pathways, and updating or refining metabolic networks. Our findings underscore the potential of large language models and network analysis techniques in extracting and visualizing metabolic information from scientific literature. This approach enables researchers to gain insights into complex biological systems, advancing our understanding of metabolic pathways and their components.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC