刘天, 阮德欣, 侯清波, 陈红缨. 传染病组合预测模型的构建—基于R软件[J]. 疾病监测, 2023, 38(9): 1094-1100. DOI: 10.3784/jbjc.202211090482
引用本文: 刘天, 阮德欣, 侯清波, 陈红缨. 传染病组合预测模型的构建—基于R软件[J]. 疾病监测, 2023, 38(9): 1094-1100. DOI: 10.3784/jbjc.202211090482
Liu Tian, Ruan Dexin, Hou Qingbo, Chen Hongying. Construction of combinatorial prediction model for infectious diseases based on software R[J]. Disease Surveillance, 2023, 38(9): 1094-1100. DOI: 10.3784/jbjc.202211090482
Citation: Liu Tian, Ruan Dexin, Hou Qingbo, Chen Hongying. Construction of combinatorial prediction model for infectious diseases based on software R[J]. Disease Surveillance, 2023, 38(9): 1094-1100. DOI: 10.3784/jbjc.202211090482

传染病组合预测模型的构建—基于R软件

Construction of combinatorial prediction model for infectious diseases based on software R

  • 摘要:
      目的  利用R软件构建传染病组合预测模型,为疾病监测工作者提供参考。
      方法  以全国(不含香港、澳门特别行政区和台湾省数据)、吉林省、辽宁省、黑龙江省2004—2017年肾综合征出血热(HFRS)逐月发病率作为训练数据拟合模型,2018年1—12月数据用于评价预测效果。 组合模型选择自回归移动平均模型(SARIMA)、指数平滑模型(ETS)、自回归神经网络模型(NNETAR)、指数平滑空间状态模型(TBATS)进行组合,采用R软件“forecastHybrid”包构建。 单一模型权重相同的组合模型记为组合模型A;单一模型权重根据训练数据拟合效果确定,记为组合模型B。 采用平均绝对误差百分比(MAPE)、离均差平方和(RMSE)评价6个模型拟合及预测效果。 选取2004—2011年、2004—2012年、2004—2013年、2004—2014年、2004—2015年、2004—2016年、2004—2017年数据作为训练集分别建立模型,预测未来1—12月发病率进行敏感性分析。 计算组合模型拟合及预测的MAPE、RMSE顺位累计和评价模型拟合及预测效果稳定性。
      结果  SARIMA、ETS、NNETAR、TBATS、组合模型A、组合模型B在全国、吉林省、辽宁省、黑龙江省拟合的MAPE依次为11.81%、 9.75%、11.50%、9.71%、8.09%、8.06%;29.63%、15.39%、23.04%、14.60%、16.33%, 16.29%;19.76%、15.48%、3.93%、15.24%、12.66%、7.08%;21.92%、17.96%、6.73%、15.80%、13.55%、 10.29%。上述6个模型在上述4个地区预测的MAPE依次为23.38%、20.35%、11.01%、34.28%、17.03%、16.02%;11.72%、14.26%、24.32%、14.16%、11.93%、11.92%; 28.09%、27.57%、29.19%、27.32%、26.91%、26.49%;23.72%、33.28%、28.96%、33.75%、25.86%、27.31%。上述6个模型在上述4个地区拟合的RMSE依次为0.01、0.01、0.01、0.01、0.01、0.01;0.08、0.08、0.05、0.08、0.05、0.05;0.08、0.07、0.01、0.07、0.04、0.02;0.16、0.16、 0.04、0.15、0.08、0.06。上述6个模型在上述4个地区预测的RMSE依次为0.02、0.01、0.02、 0.02、0.01、0.01;0.03、0.04、0.07、0.04、0.05、0.05;0.07、0.05、0.05、0.05、0.05、0.05;0.13、0.14、0.11、0.14、0.12、0.12。 敏感性分析显示,从拟合效果来看,组合模型B总体上在4个地区均居首位;组合模型A居2~4位。 从预测效果来看,组合模型在4个地区的2个评价指标中最优位次为首位或第2位。
      结论  组合模型拟合及预测优于单一模型,根据训练数据拟合效果确定模型权重的组合模型为最优模型。 利用R软件通过简单编程即能构建组合模型,值得推广应用。

     

    Abstract:
      Objective  To construct a combinatorial prediction model for infectious diseases by using software R, and provide reference for disease surveillance.
      Methods  The monthly incidence of hemorrhagic fever with renal syndrome (HFRS) in China, and in Jilin, Liaoning and Heilongjiang provinces from 2004 to 2017 were used as the training data to fit models, and the data from January to December 2018 were used to evaluate the prediction effect. Seasonal autoregressive integrated moving average (SARIMA), exponential smoothing (ETS), neural network autoregression (NNETAR), and exponential smoothing state space model (TBATS) were selected, and “forecastHybrid” package in software R was used to construct combinatorial models. Single models with the same weight was recorded as combinatorial model A; the combinatorial model with the weight of single models determined according to the fitting effect of the training data was recorded as combinatorial model B. Mean absolute percentage error (MAPE) and root mean square error (RMSE) were used to evaluate the fitting and prediction effects of the six models. The data from 2004 to 2011, 2004 to 2012, 2004 to 2013, 2004 to 2014, 2004 to 2015, 2004 to 2016 and 2004 to 2017 were selected as training sets to construct the models respectively, and predict the incidence from January to December in the following year for sensitivity analysis. The fitting of combinatorial model, and the rank of MAPE and RMSE predicted were calculated to evaluate the stability of model fitting and prediction effects.
      Results  The MAPEs fitted by SARIMA, ETS, NNETAR, TBATS, combinatorial model A, and combinatorial model B in China, and in Jilin, Liaoning and Heilongjiang provinces were 11.81%, 9.75%, 11.50%, 9.71%, 8.09%, 8.06%; 29.63%, 15.39%, 23.04%, 14.60%, 16.33%, 16.29%;19.76%, 15.48%, 3.93%, 15.24%, 12.66%, 7.08%; 21.92%, 17.96%, 6.73%, 15.80%, 13.55%, 10.29% respectively. The predicted MAPEs of SARIMA, ETS, NNETAR, TBATS, combinatorial model A and combinatorial model B in China, and in Jilin, Liaoning and Heilongjiang provinces were 23.38%, 20.35%, 11.01%, 34.28%, 17.03%, 16.02%; 11.72%%, 14.26%, 24.32%, 14.16%, 11.93%, 11.92%; 28.09%, 27.57%, 29.19%, 27.32%, 26.91%, 26.49%; 23.72%, 33.28%, 28.96%, 33.785%, 25% 27.31% respectively. The RMSEs fitted by SARIMA, ETS, NNETAR, TBATS, combinatorial model A and combinatorial model B in China, and in Jilin, Liaoning and Heilongjiang provinces were 0.01, 0.01, 0.01, 0.01, 0.01, 0.01; 0.08, 0.08, 0.05, 0.08, 0.05, 0.05; 0.08, 0.07, 0.01, 0.07, 0.04, 0.02; 0.16, 0.16, 0.04, 0.15, 0.08, 0.06 respectively. The RMSEs predicted by SARIMA, ETS, NNETAR, TBATS, combinatorial model A and combinatorial model B in China, and in Jilin, Liaoning and Heilongjiang provinces were 0.02, 0.01, 0.02, 0.02, 0.01, 0.01; 0.03, 0.04, 0.07, 0.04 , 0.05, 0.05; 0.07, 0.05, 0.05, 0.05, 0.05, 0.05; 0.13, 0.14, 0.11, 0.14, 0.12, 0.12 respectively. The sensitivity analysis showed that the overall fitting effect of combinatorial model B ranked first, and combinatorial model A ranked 2–4. From the perspective of the prediction effect, the optimal effect of the combinatorial model in the two evaluation indicators ranked first or second.
      Conclusion  The performance of fitting and prediction of the combinatorial model are better than that of the single model, and the combinatorial model with the model weights according to the fitting effect of the training data is the optimal model. The combinatorial model can be constructed by simple programming using software R, the application of the model is worthy to promote.

     

/

返回文章
返回