Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.07.01.547179v1?rss=1
Authors: Hu, Y., Wang, Y., Hu, X., Chao, H., Li, S., Ni, Q., Zhu, Y., Hu, Y., Zhao, Z., Chen, M.
Abstract: Many pathogenic bacteria use type IV secretion systems to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells and causes diseases. Identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but it remains a big challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs, and compared their performance. An integrated pipeline T4SEpp was assembled by a module searching full-length, signal sequence and effector domain homologs of known T4SEs, a machine learning module based on the hand-crafted features extracted from the signal sequences, and the third module containg three best-performed protein language pre-trained models. T4SEpp outperforms the other state-of-the-art software tools, achieving ~0.95 sensitivity at a high specificity of ~0.99 based on the assessment of an independent testing dataset. T4SEpp predicted 13 potential T4SEs, including the H. pylori cytotoxin-associated gene A (CagA). Among these, 10 T4SEs have the potential to interact with at least one human protein. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist identification of bacterial T4SEs, and facilitates studies on bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp.
Copy rights belong to original authors. Visit the link for more info
Podcast created by Paper Player, LLC