cover of episode High performance Legionella pneumophila source attribution using genomics-based machine learning classification

High performance Legionella pneumophila source attribution using genomics-based machine learning classification

2023/3/22
logo of podcast PaperPlayer biorxiv bioinformatics

PaperPlayer biorxiv bioinformatics

Frequently requested episodes will be transcribed first

Shownotes Transcript

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.19.532693v1?rss=1

Authors: Buultjens, A. H., Vandelannoote, K., Mercoulia, K. H., Ballard, S. A., Sloggett, C., Howden, B., Seeman, T., Stinear, T. P.

Abstract: Fundamental to effective Legionnaires' disease outbreak control is the ability to rapidly identify the environmental source(s) of the causative agent, Legionella pneumophila. Genomics has revolutionized pathogen surveillance but L. pneumophila has a complex ecology and population structure that can limit source inference based on standard core genome phylogenetics. Here we present a powerful machine learning approach that assigns the geographical source of Legionnaires' disease outbreaks more accurately than current core genome comparisons. Models were developed upon 534 L. pneumophila genome sequences, including 149 genomes linked to 20 previously reported Legionnaires' disease outbreaks through detailed case investigations. Our classification models were developed in a cross-validation framework using only environmental L. pneumophila genomes. Assignments of clinical isolate geographic origins demonstrated high predictive sensitivity and specificity of the models, with no false positives or false negatives for 13 out of 20 outbreak groups, despite the presence of within-outbreak polyclonal population structure. Analysis of the same 534-genome panel with a conventional phylogenomic tree and a core genome multi-locus sequence type allelic distance-based classification approach revealed that our machine learning method had the highest overall classification performance - agreement with epidemiological information. Our multivariate statistical learning approach maximises use of genomic variation data and is thus well-suited for supporting Legionnaires' disease outbreak investigations.

Copy rights belong to original authors. Visit the link for more info

Podcast created by Paper Player, LLC