a two-layer approach for identifying type III secreted effectors using ensemble learning
Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (1) most of these trained machine learning models, based on the N-terminus (or incorporating also the C-terminus) instead of the proteins’ complete sequences, and (2) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. Thus, to achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model.
In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features, and finally integrates these models through ensemble learning. Specifically, we trained the models using a new gradient boosting machine, LightGBM, and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively out-performed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction.
-
The following browsers are supported by this website:
- Windows: Chrome, Firefox,Internet Explorer 8+, Opera
- Mac: Chrome, Firefox, Opera, Safari
- Linux: Chrome, Firefox
- Wang J et al. Bastion3: a two-layer approach for identifying type III secreted effectors using ensemble learning. Bioinformatics 2019;35(12):2017-2028. DOI: 10.1093/bioinformatics/bty914.
Lithgow Group
Infection and Immunity Program
Biomedicine Discovery Institute
Faculty of Medicine, Nursing and Health Sciences
Monash University
Melbourne, VIC 3800, Australia
Contact Us