Shen Lingjun, Li Wenming, Luo Yun, Zhang Huajie, Han Liuxin, Wang Ge, Zhao Yanhong, Huang Yuanqing, Li Shan, Li Longfen, Shi Chunjing. Screening of immunogenic molecular markers for active tuberculosis based on multiple machine learning algorithmsJ. Disease Surveillance, 2026, 41(3): 327-335. DOI: 10.3784/jbjc.202503200189
Citation: Shen Lingjun, Li Wenming, Luo Yun, Zhang Huajie, Han Liuxin, Wang Ge, Zhao Yanhong, Huang Yuanqing, Li Shan, Li Longfen, Shi Chunjing. Screening of immunogenic molecular markers for active tuberculosis based on multiple machine learning algorithmsJ. Disease Surveillance, 2026, 41(3): 327-335. DOI: 10.3784/jbjc.202503200189

Screening of immunogenic molecular markers for active tuberculosis based on multiple machine learning algorithms

  • Objective To understand the role of immune-related genes in active tuberculosis (TB) through bioinformatics and machine learning research.
    Methods The datasets GSE42825, GSE42830, and GSE83456 related to ATB were downloaded from the Gene Expression Omnibus database for the screening of differentially expressed genes (DEGs) associated with TB. The immune-related gene sets (IRGs) were obtained from the GeneCards database. The intersection of DEGs and IRGs was taken to obtain the differentially expressed immune-related genes (DEIRGs), followed by functional enrichment and pathway analysis. Furthermore, key genes were identified by using Support Vector Machine - Recursive Feature Elimination, Least Absolute Shrinkage and Selection Operator, and the Boruta algorithm. The area under the ROC curve (AUC) was used for internal and external validation, as well as for model construction. Calibration curves and clinical decision curves were used to evaluate the calibration level and clinical efficiency of the model. Shapley Additive exPlanations (SHAP) method was used to interpret the importance of each feature of the model and elucidate the model's prediction process.
    Results A total of 502 DEGs were identified, and the intersection with immune-related genes yielded 166 DEIRGs. Enrichment analysis revealed that these DEIRGs were mainly associated with the Toll-like receptor signaling pathway, NOD-like receptor signaling pathway, nuclear factor-kappa B (NF-κB) signaling pathway, mitogen-activated protein kinase signaling pathway, and phosphatidylinositol 3-kinase signaling pathway. In the validation set, five TB-related DEIRGs were screened by using three machine learning algorithms. One gene with an AUC less than 0.70 was excluded, and four DEIRGs (AIM2, FCGR1A, IFITM3, SOCS1) were obtained, which were used to construct a diagnostic model (AUC =0.98). Calibration curves and the Hosmer-Lemeshow test indicated reliable calibration of the model, and decision curve analysis demonstrated the clinical efficiency of the model. SHAP analysis ranked the importance of the four feature genes as follows: IFITM3, FCGR1A, SOCS1, and AIM2.
    Conclusion Our findings indicate that DEIRGs are associated with immune responses, cellular structures, and enzymatic activities, further improving the understanding of TB. These biomarkers might outperform the previously reported molecular indicators in the combined detection and prediction of active TB.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return