Abstract:
Objective To develop an influenza prediction model for hospitalized patients by using unstructured electronic medical records (EMRs) and identify potential predictors for influenza prediction.
Methods Based on the unstructured EMRs from 7265 hospitalized patients in three hospitals in Guangdong province, machine learning models were used to establish the influenza prediction framework. Eighteen significant predictors were selected through a hybrid feature selection approach, such as Random Forest, Boruta, extreme gradient boosting(XGBoost) and χ2 tests. Then the performances of random forest, support vector machine (SVM), naive Bayes (NB), multilayer perceptron (MLP), and XGBoost models were systematically compared.
Results The XGBoost model demonstrated superior performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.92 (95%CI: 0.92-0.93) and accuracy of 0.86 (95%CI: 0.86-0.87) in internal validation and an AUC of 0.67 (95%CI: 0.61-0.74) and accuracy of 0.79 (95%CI: 0.76-0.81) in external validation. Besides traditional predictors (e.g., cough, fever, age, gender, nasal congestion), the Shapley additive explanations, (SHAP) algorithm identified novel predictors, including limited activity, dizziness, chest tightness, and anorexia.
Conclusion The proposed model can facilitate the early identification of population at high-risk for influenza, optimize clinical decision-making, improve healthcare resource efficiency, and identify potential influenza-associated symptoms.