Application of SARIMA model and Holt winters index smoothing method to predict incidence of pulmonary tuberculosis in Jiangsu
-
摘要:
目的 建立季节性差分自回归移动平均(SARIMA)模型和Holt-Winters指数平滑法,对江苏省结核病发病数进行预测,并评价两种方法的准确性,旨在为江苏省肺结核防控提供科学参考。 方法 利用2016年1月至2020年12月江苏省肺结核发病数据分别建立SARIMA模型和Holt-Winters指数平滑法模型,以2021年1—12月肺结核发病数验证模型并用均方根误差(RMSE)、平均绝对误差(MAE)、平均绝对百分比误差(MAPE)评价两种模型的预测效果。 结果 拟合最佳的SARIMA模型为(0,1,2)(0,1,0)12, RMSE为229.52, MAE为146.81, MAPE为6.33%,总的相对误差为5.21%。 Holt-Winters相加模型的RMSE为206.75,MAE为156.45,MAPE为6.63%,总的相对误差为7.74%。 结论 两种模型均能较好的拟合肺结核发病数,SARIMA模型预测效果更佳。 -
关键词:
- 自回归移动平均模型 /
- Holt-Winters相加模型 /
- 肺结核 /
- 模型预测
Abstract:Objective To establish a seasonal auto regressive integrated moving average (SARIMA) model and a Holt-Winters exponential smoothing model for the prediction of the case number of tuberculosis (TB) in Jiangsu province and provide scientific reference for the prevention and control of TB in Jiangsu. Methods The SARIMA model and Holt-Winters exponential smoothing model were established by using the TB incidence data in Jiangsu from January 2016 to December 2020. The validation of the model used the TB incidence from January to December 2021 and evaluation of the models’ prediction effect used root-mean-square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). Results The best SARIMA model was (0,1,2) (0,1,0) 12, the RMSE was 229.52, MAE was 146.81 and MAPE was 6.33%, and the total relative error was 5.21%. For Holt winters additive model, the RMSE was 206.75, MAE was 156.45, MAPE was 6.63%, and the total relative error was 7.74%. Conclusion Both models can well fit the number of pulmonary TB, and the performance of SARIMA model was slightly better. -
Key words:
- ARIMA model /
- Holt winters additive model /
- Pulmonary tuberculosis /
- Model prediction
-
表 1 备选模型检验
Table 1. Test of alternative models
模型参数 残差检验(P) 赤池信息准则 贝叶斯信息准则 平均绝对百分比误差(%) (0,1,2)(0,1,0)12 0.81 661.94 667.41 6.34 (0,1,2)(0,1,1)12 0.80 663.35 670.72 6.27 (0,1,2)(0,1,2)12 0.87 662.43 671.63 5.93 (0,1,2)(1,1,0)12 0.83 663.66 671.01 6.30 (0,1,2)(1,1,1)12 0.83 664.67 673.90 6.07 (0,1,2)(1,1,2)12 0.84 663.85 674.91 5.55 (0,1,2)(2,1,0)12 0.86 662.38 671.60 6.08 (0,1,2)(2,1,1)12 0.75 662.75 673.84 5.05 (0,1,2)(2,1,2)12 0.61 663.31 676.26 4.25 表 2 Holt-Winters模型拟合评价效果
Table 2. Evaluation effect of Holt-Winters model fitting
模型 参数 拟合评价 α β γ 平均绝对
误差均方根
误差平均绝对百分
比误差(%)Holt-Winters
相加模型0.80 0.001 0.001 156.45 206.75 6.63 Holt-Winters
相乘模型0.71 0.001 0.001 165.37 221.91 7.01 表 3 ARIMA(0,1,2)(0,1,0)12对2021年1-12月江苏省肺结核发病预测
Table 3. ARIMA (0, 1, 2) (0, 1, 0)12 prediction of pulmonary TB incidence in Jiangsu, January-October, 2021
月份 实际值
(发病数)预测值
(发病数)相对误差
(%)95%置信区间 1 2 215 2 020 8.80 1 512.14~2 528.68 2 1 909 1 265 33.73 557.45~1 972.32 3 2 645 2 002 24.31 1 108.36~2 895.41 4 2 640 2 423 8.22 1 375.83~3 469.93 5 2 537 2 468 2.72 1 287.10~3 648.66 6 2 550 2 493 2.24 1 192.05~3 793.71 7 2 700 2 647 1.97 1 236.17~4 057.58 8 2 055 2 428 18.15 915.80~3 940.50 9 2 220 2 425 9.23 816.80~4 032.96 10 2 167 2 128 1.78 429.69~3 826.07 11 2 245 2 242 0.13 458.12~4 025.63 12 2 006 1 895 5.53 29.48~3760.28 合计 27 889 26 436 5.21 – 表 4 Holt-Winters相加模型对2021年1-12月江苏省肺结核发病预测
Table 4. Holt-Winters additive model prediction of pulmonary TB incidence of in Jiangsu, January-December, 2021
月份 实际值
(发病数)预测值
(发病数)相对误差
(%)95%置信区间 1 2 215 2 585 16.73 2 073.69~3 097.39 2 1 909 2 284 19.65 1 655.42~2 912.95 3 2 645 2 963 12.05 2 236.64~3 690.87 4 2 640 2 952 11.84 2 138.92~3 766.26 5 2 537 2 761 8.83 1 869.34~3 653.05 6 2 550 2 637 3.41 1 673.29~3 600.73 7 2 700 2 666 1.23 1 636.03~3 697.20 8 2 055 2 517 22.52 1 424.56~3 611.31 9 2 220 2 402 8.22 1 249.86~3 555.35 10 2 167 2 106 2.80 897.10~3 315.51 11 2 245 2 159 3.83 895.72~3 422.00 12 2 006 2 015 0.45 700.41~3 330.15 合计 27 889 30 047 7.74 – 表 5 两种模型预测评价
Table 5. Evaluation of predictions of two models
模型 预测评价 平均绝对
误差均方根
误差平均绝对百分比
误差(%)ARIMA(0,1,2)
(0,1,0)12模型146.81 229.52 6.33 Holt-Winters相加模型 156.45 206.75 6.63 -
[1] 肖红亮, 郭述良. 浅析肺结核介入诊疗现状及进展[J]. 临床肺科杂志,2018,23(10):1891–1898. DOI:10.3969/j.issn.1009−6663.2018.10.036.Xiao HL, Guo SL. Current situation and progress of interventional diagnosis and treatment of pulmonary tuberculosis[J]. J Clin Pulm Med, 2018, 23(10): 1891–1898. DOI: 10.3969/j.issn.1009−6663.2018.10.036. [2] 疾病预防控制局. 2019年全国法定传染病疫情概况[EB/OL]. (2020−04−20)[2021−09−15]. http://www.nhc.gov.cn/jkj/s3578/202004/b1519e1bc1a944fc8ec176db600f68d1.shtml.Bureau of Disease Control and Prevention. National epidemic situation of legal infectious diseases in 2019[EB/OL]. (2020−04−20)[2021−09−15]. http://www.nhc.gov.cn/jkj/s3578/202004/b1519e1bc1a944fc8ec176db600f68d1.shtml. [3] 张屹立, 郑晓宇, 吴月. 不同化疗方案对复治肺结核治疗的效果研究[J]. 当代医学,2021,27(33):53–56. DOI:10.3969/j.issn.1009−4393.2021.33.021.Zhang YL, Zheng XY, Wu Y. Study on the therapeutic effect of different chemotherapy regimens on patients with recurrent tuberculosis[J]. Contemp Med, 2021, 27(33): 53–56. DOI: 10.3969/j.issn.1009−4393.2021.33.021. [4] 王燕. 时间序列分析—基于R[M]. 北京: 中国人民大学出版社, 2015: 6–219.Wang Y. Time series analysis-with R[M]. Beijing: China Renmin University Press, 2015: 6–219. [5] 周扬, 梁士杰. ARIMA乘积季节模型在郑州市肺结核月发病趋势预测中的应用[J]. 中国卫生统计,2021,38(4):554–555. DOI:10.3969/j.issn.1002−3674.2021.04.019.Zhou Y, Liang SJ. Application of ARIMA product seasonal model in predicting the monthly incidence trend of pulmonary tuberculosis in Zhengzhou[J]. Chin J Health Stat, 2021, 38(4): 554–555. DOI: 10.3969/j.issn.1002−3674.2021.04.019. [6] 黄艳华. 乘积季节模型ARIMA(p, d, q)×(P, D, Q) s在CPI分析中的应用[J]. 重庆工商大学学报:自然科学版,2016,33(3):70–75.Huang YH. Application of multiplicative seasonal model ARIMA (p, d, q)×(P, D, Q) to CPI analysis[J]. J Chongqing Technol Business Univ:Nat Sci Ed, 2016, 33(3): 70–75. [7] Packard GC. Misconceptions about logarithmic transformation and the traditional allometric method[J]. Zoology, 2017, 123: 115–120. DOI: 10.1016/j.zool.2017.07.005. [8] Wang LL, Liang C, Wu W, et al. Epidemic situation of brucellosis in Jinzhou city of China and prediction using the ARIMA model[J]. Can J Infect Dis Med Microbiol, 2019, 2019: 1429462. DOI: 10.1155/2019/1429462. [9] 寻鲁宁, 崔泽, 孙纪新, 等. ARIMA模型和Holt-Winters指数平滑法在自杀死亡中的预测效果比较[J]. 实用预防医学,2021,28(6):661–665. DOI:10.3969/j.issn.1006−3110.2021.06.006.Xun LN, Cui Z, Sun JX, et al. Comparison of effects of ARIMA model and Holt-Winters exponential smoothing method in the prediction of suicide death[J]. Pract Prev Med, 2021, 28(6): 661–665. DOI: 10.3969/j.issn.1006−3110.2021.06.006. [10] 林淑芳, 周银发, 张山鹰, 等. 2010-2019年福建省肺结核流行特征及发病预测模型应用[J]. 中华疾病控制杂志,2021,25(7):768–774. DOI: 10.16462/j.cnki.zhjbkz.2021.07.006.Lin SF, Zhou YF, Zhang SY, et al. Analysis of tuberculosis epidemiological characteristics and application of incidence prediction model in Fujian province from 2010 to 2019[J]. Chin J Dis Control Prev, 2021, 25(7): 768–774. DOI: 10.16462/j.cnki.zhjbkz.2021.07.006. [11] Wang CL, Li YD, Feng W, et al. Epidemiological features and forecast model analysis for the morbidity of influenza in Ningbo, China, 2006–2014[J]. Int J Environ Res Public Health, 2017, 14(6): 559. DOI: 10.3390/ijerph14060559. [12] 陈韵洁, 许萍, 白同元. 基于ARIMA模型的社会消费品零售总额的分析与预测[J]. 现代商贸工业,2021,42(34):26–27. DOI:10.19311/j.cnki.1672−3198.2021.34.011.Chen YJ, Xu P, Bai TY. Analysis and prediction of total retail sales of social consumer goods based on ARIMA model[J]. Mod Bus Trade Ind, 2021, 42(34): 26–27. DOI: 10.19311/j.cnki.1672−3198.2021.34.011. [13] 杨颖颖, 陶佩君, 崔永福. 基于GM(1, 1)模型和ARIMA模型的大名县油料产量分析[J]. 粮油与饲料科技, 2021(5): 1–5. DOI: 10.3969/j.issn.1008-6137.2021.05.001.Yang YY, Tao PJ, Cui YF. Analysis of oil production in Daming county based on GM (1, 1) model and ARIMA model[J]. Grain Oil Feed Technol, 2021(5): 1–5. DOI: 10.3969/j.issn.1008-6137.2021.05.001. [14] 欧阳霄, 杨辉, 朱东济, 等. 基于ARIMA和神经网络的单脉冲雷达设备故障预测研究[J]. 计算机测量与控制,2021,29(10):83–87. DOI:10.16526/j.cnki.11−4762/tp.2021.10.015.Ouyang X, Yang H, Zhu DJ, et al. Research on technology of fault prediction for monopulse radar based on ARIMA and neural network[J]. Comput Meas Control, 2021, 29(10): 83–87. DOI: 10.16526/j.cnki.11−4762/tp.2021.10.015. [15] 张鲁玉, 孙亮, 马兰, 等. SARIMA模型和Holt-winters模型在我国丙肝月报告发病人数预测中的应用比较[J]. 现代预防医学,2020,47(21):3855–3858,3951.Zhang LY, Sun L, Ma L, et al. Comparison of SARIMA model and Holt-Winters model in predicting the number of HCV patients in China[J]. Mod Prev Med, 2020, 47(21): 3855–3858,3951. [16] 杨振, 聂艳武, 孙亚红, 等. 基于Prophet等时间序列季节模型的肺结核发病预测及对比分析[J]. 现代预防医学,2021,48(21):3841–3846,3883.Yang Z, Nie YW, Sun YH, et al. Prediction and comparative analysis of tuberculosis incidence based on Prophet and other time series seasonal model[J]. Mod Prev Med, 2021, 48(21): 3841–3846,3883. [17] 汪鹏, 彭颖, 杨小兵. ARIMA模型与Holt-Winters指数平滑模型在武汉市流感样病例预测中的应用[J]. 现代预防医学,2018,45(3):385–389.Wang P, Peng Y, Yang XB. ARIMA model and Holt-Winters exponential smoothing method to predict influenza-like cases, Wuhan[J]. Mod Prev Med, 2018, 45(3): 385–389. [18] 魏珊, 陆一涵, 高眉扬, 等. 中国主要法定报告传染病的“春节效应”研究[J]. 复旦学报:医学版,2013,40(2):153–158. DOI:10.3969/j.issn.1672−8467.2013.02.005.Wei S, Lu YH, Gao MY, et al. “Spring Festival effects” on the main notifiable communicable diseases in China[J]. Fudan Univ J Med Ed, 2013, 40(2): 153–158. DOI: 10.3969/j.issn.1672−8467.2013.02.005. [19] He WC, Ju K, Gao YM, et al. Spatial inequality, characteristics of internal migration, and pulmonary tuberculosis in China, 2011–2017: a spatial analysis[J]. Infect Dis Poverty, 2020, 9(1): 159. DOI: 10.1186/s40249−020−00778−0. -