李言飞, 张业武, 王晓风, 王丽萍. 2005-2017年全国法定传染病重复报告卡大数据分析与应用[J]. 疾病监测, 2019, 34(5): 468-472. DOI: 10.3784/j.issn.1003-9961.2019.05.021
引用本文: 李言飞, 张业武, 王晓风, 王丽萍. 2005-2017年全国法定传染病重复报告卡大数据分析与应用[J]. 疾病监测, 2019, 34(5): 468-472. DOI: 10.3784/j.issn.1003-9961.2019.05.021
Yanfei Li, Yewu Zhang, Xiaofeng Wang, Liping Wang. Application analysis of big data of duplicate reporting cards in National Notifiable Disease Report System, 2005–2017[J]. Disease Surveillance, 2019, 34(5): 468-472. DOI: 10.3784/j.issn.1003-9961.2019.05.021
Citation: Yanfei Li, Yewu Zhang, Xiaofeng Wang, Liping Wang. Application analysis of big data of duplicate reporting cards in National Notifiable Disease Report System, 2005–2017[J]. Disease Surveillance, 2019, 34(5): 468-472. DOI: 10.3784/j.issn.1003-9961.2019.05.021

2005-2017年全国法定传染病重复报告卡大数据分析与应用

Application analysis of big data of duplicate reporting cards in National Notifiable Disease Report System, 2005–2017

  • 摘要:
    目的了解全国传染病报告信息管理系统中法定传染病报告卡重复报告(重卡)现状,分析重卡原因,提出解决方法,进一步规范报告管理,提高数据质量。
    方法基于Hadoop和Spark为核心的大数据分析平台,利用Python和Jupyter notebook,按照查重条件对2005 — 2017年全国法定传染病报告卡分别统计分析年度内重卡、累计重卡和跨年度重卡,并利用Python包matplotlib绘图展示统计结果。
    结果2005 — 2017年年度内重卡率平均为7.65/万,累计重卡数1 141 539张,累计重卡率133.47/万,重卡数居前3位的疾病是乙型肝炎、手足口病和肺结核,占比分别为30.23%、28.01%和12.96%。 其中2017年年度内重卡率为11.19/万,年内重卡数8 497张,跨年度重卡累计276 194张。
    结论法定传染病年度内卡片重复报告依然有待加强管理,跨年度重卡和累计重卡逐年增多,严重影响数据分析,需要尽快采取相应的数据管理和分析对策。

     

    Abstract:
    ObjectiveTo understand the current status of duplicate reporting cards in National Notifiable Disease Report System (NNDRS), analyze the causes of duplicate reporting and suggest solutions for the further standardization of reporting management and improvement of data quality.
    MethodsThe analyses on annual duplicate reporting cards, cumulative duplicate reporting cards and duplicate reporting cards beyond year of notifiable diseases in China from January 2005 to December 2017 were conducted by using Python and Jupyter notebook based on Hadoop and Spark big data analysis platform. And the results are displayed by using Python library matplotlib.
    ResultsDuring 2005–2017, the annual average rate of duplicate reporting cards was 7.65/10 000, and the cumulative number of duplicate reporting cards was 1 141 539 and the rate of cumulative reporting cards was 133.47/10 000. The first three diseases with high duplicate reporting card numbers were hepatitis B, hand foot and mouth disease and tuberculosis, accounting for 30.23%, 28.01% and 12.96% respectively of the total. In 2017, the rate of duplicate reporting cards was 11.19/10 000, the annual duplicate reporting card number was 8 497 and the cumulative duplicate reporting card number beyond year was 276 194.
    ConclusionThe data quality in NNDRS needs to be improved due to increased annual duplicate reporting cards beyond year and the rate of cumulative duplicate reporting cards. The duplicate reporting cards seriously influenced data analysis. It is necessary to take appropriate data management and analysis measure as soon as possible.

     

/

返回文章
返回