基于广东省三所医院7265例住院病人的流行性感冒监测预警模型研究

A study of early-warning model for influenza surveillance in 7265 hospitalized patients in three hospitals in Guangdong

  • 摘要:
    目的 基于非结构化电子医疗记录,建立针对住院患者的流行性感冒(流感)预测模型,并识别潜在预测因子。
    方法 基于广东省三家医院7265例住院患者的非结构化电子医疗记录,结合机器学习模型构建流感预测模型。首先通过随机森林、Boruta、极端梯度提升(XGBoost)和χ2检验方法筛选出18个显著预测变量,并对比了随机森林、支持向量机、朴素贝叶斯、多层感知机和XGBoost模型的性能。
    结果 XGBoost模型在内部验证中受试者工作特征曲线下面积(AUC)值为0.92(95%CI: 0.92~0.93),准确率为0.86(95%CI:0.86~0.87);外部验证中AUC值为0.67(95%CI:0.61~0.74),准确率为0.79(95%CI:0.76~0.81),综合表现最优。沙普利加性解释(SHAP)算法在传统的咳嗽、发热、年龄、性别鼻塞等预测因素外,进一步识别出活动受限、头晕、胸闷和纳差等潜在的预测因子。
    结论 本研究建立的模型可辅助临床早期识别流感高危患者,优化诊疗决策,提升医疗资源效率,并识别出潜在的与流感相关的症状。

     

    Abstract:
    Objective To develop an influenza prediction model for hospitalized patients by using unstructured electronic medical records (EMRs) and identify potential predictors for influenza prediction.
    Methods Based on the unstructured EMRs from 7265 hospitalized patients in three hospitals in Guangdong province, machine learning models were used to establish the influenza prediction framework. Eighteen significant predictors were selected through a hybrid feature selection approach, such as Random Forest, Boruta, extreme gradient boosting(XGBoost) and χ2 tests. Then the performances of random forest, support vector machine (SVM), naive Bayes (NB), multilayer perceptron (MLP), and XGBoost models were systematically compared.
    Results The XGBoost model demonstrated superior performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.92 (95%CI: 0.92-0.93) and accuracy of 0.86 (95%CI: 0.86-0.87) in internal validation and an AUC of 0.67 (95%CI: 0.61-0.74) and accuracy of 0.79 (95%CI: 0.76-0.81) in external validation. Besides traditional predictors (e.g., cough, fever, age, gender, nasal congestion), the Shapley additive explanations, (SHAP) algorithm identified novel predictors, including limited activity, dizziness, chest tightness, and anorexia.
    Conclusion The proposed model can facilitate the early identification of population at high-risk for influenza, optimize clinical decision-making, improve healthcare resource efficiency, and identify potential influenza-associated symptoms.

     

/

返回文章
返回