张业武, 郭青, 张春曦, 王晓风, 于萌, 苏雪梅. 概率数据匹配方法在传染病报告信息匹配中的应用[J]. 疾病监测, 2015, 30(9): 792-795. DOI: 10.3784/j.issn.1003-9961.2015.09.022
引用本文: 张业武, 郭青, 张春曦, 王晓风, 于萌, 苏雪梅. 概率数据匹配方法在传染病报告信息匹配中的应用[J]. 疾病监测, 2015, 30(9): 792-795. DOI: 10.3784/j.issn.1003-9961.2015.09.022
ZHANG Ye-wu, GUO Qing, ZHANG Chun-xi, WANG Xiao-feng, YU Meng, SU Xue-mei. Application of probabilistic record linkage method in communicable disease reporting information matching[J]. Disease Surveillance, 2015, 30(9): 792-795. DOI: 10.3784/j.issn.1003-9961.2015.09.022
Citation: ZHANG Ye-wu, GUO Qing, ZHANG Chun-xi, WANG Xiao-feng, YU Meng, SU Xue-mei. Application of probabilistic record linkage method in communicable disease reporting information matching[J]. Disease Surveillance, 2015, 30(9): 792-795. DOI: 10.3784/j.issn.1003-9961.2015.09.022

概率数据匹配方法在传染病报告信息匹配中的应用

Application of probabilistic record linkage method in communicable disease reporting information matching

  • 摘要: 目的 为解决全国医疗机构法定传染病报告质量调查过程中现场调查数据与传染病网络直报系统记录匹配问题,采用概率数据匹配方法对不同来源的信息进行匹配。方法 采用改良的Fellegi-Sunter概率数据匹配方法,对匹配项系数进行赋值,分别计算每一配对记录之间相似性得分,若匹配相似性得分超过一定的阈值(cut-off值)后,即认为匹配成功。对自动匹配结果进行人工核对,并作为金标准,对自动匹配结果进行评价。结果 将调查过程中获取的2153条原始记录与网络直报系统中97 271张传染病报告卡信息进行分层多维度概率匹配。以总得分25分作为阈值,将自动匹配结果与人工判断结果比较。结果显示,自动匹配的灵敏度为98.96%(95%CI:98.39%~99.36%),特异度为94.92% (95%CI:91.29%~97.35%),总一致率为98.51%(95%CI:97.91%~98.98%),Kappa值为0.9250,ROC曲线下面积为0.9979。结论 分层多维度概率匹配方法成功解决了现场调查的原始数据与网络报告系统的数据匹配问题,匹配结果与实际情况具有较高的一致性,显著提高了工作效率,也为今后开展类似工作提供简易的分析工具。

     

    Abstract: Objective To match the records of communicable disease reporting information from field survey and that from communicable disease reporting system. Methods An improved method originated from Fellegi and Sunter on probabilistic record linkage techniques was used to assign similarity scores to pairs of records and treats all pairs that score above a certain threshold as matches. The probabilistic record matching results were verified manually and then the accuracy of the probabilistic record matching results was compared with the manual results. Results A total of 2153 records form a field survey for communicable disease reporting quality were stratified and matched with 97,271 records from communicable disease reporting system. The score 25 was used as the threshold. The accuracy of the probabilistic record matching method was compared with manual results. The results showed that the sensitivity of probabilistic record matching was 98.96% (95% CI: 98.39%-99.36%), the specificity was 94.92% (95% CI: 91.29%-97.35%), the total concordance rate was 98.51% (95% CI: 97.91%-98.98%), the Kappa value was 0.9250 and the area under the ROC was 0.9979. Conclusion Based on the basic theory of probabilistic record linkage, the records from two different sources were successfully matched and the results showed high accuracy and consistence.

     

/

返回文章
返回