ObjectiveTo understand the current status of duplicate reporting cards in National Notifiable Disease Report System (NNDRS), analyze the causes of duplicate reporting and suggest solutions for the further standardization of reporting management and improvement of data quality.
MethodsThe analyses on annual duplicate reporting cards, cumulative duplicate reporting cards and duplicate reporting cards beyond year of notifiable diseases in China from January 2005 to December 2017 were conducted by using Python and Jupyter notebook based on Hadoop and Spark big data analysis platform. And the results are displayed by using Python library matplotlib.
ResultsDuring 2005–2017, the annual average rate of duplicate reporting cards was 7.65/10 000, and the cumulative number of duplicate reporting cards was 1 141 539 and the rate of cumulative reporting cards was 133.47/10 000. The first three diseases with high duplicate reporting card numbers were hepatitis B, hand foot and mouth disease and tuberculosis, accounting for 30.23%, 28.01% and 12.96% respectively of the total. In 2017, the rate of duplicate reporting cards was 11.19/10 000, the annual duplicate reporting card number was 8 497 and the cumulative duplicate reporting card number beyond year was 276 194.
ConclusionThe data quality in NNDRS needs to be improved due to increased annual duplicate reporting cards beyond year and the rate of cumulative duplicate reporting cards. The duplicate reporting cards seriously influenced data analysis. It is necessary to take appropriate data management and analysis measure as soon as possible.