Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study

Author:

Avoundjian TigranORCID,Dombrowski Julia CORCID,Golden Matthew RORCID,Hughes James PORCID,Guthrie Brandon LORCID,Baseman JanetORCID,Sadinle MauricioORCID

Abstract

Background Many public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithms that may miss many true matches. In the context of public health action, these missed matches lead to missed opportunities to deliver interventions and may exacerbate existing health inequities. Objective This study aimed to compare the performance of record linkage algorithms commonly used in public health practice. Methods We compared five deterministic (exact, Stenger, Ocampo 1, Ocampo 2, and Bosh) and two probabilistic record linkage algorithms (fastLink and beta record linkage [BRL]) using simulations and a real-world scenario. We simulated pairs of datasets with varying numbers of errors per record and the number of matching records between the two datasets (ie, overlap). We matched the datasets using each algorithm and calculated their recall (ie, sensitivity, the proportion of true matches identified by the algorithm) and precision (ie, positive predictive value, the proportion of matches identified by the algorithm that were true matches). We estimated the average computation time by performing a match with each algorithm 20 times while varying the size of the datasets being matched. In a real-world scenario, HIV and sexually transmitted disease surveillance data from King County, Washington, were matched to identify people living with HIV who had a syphilis diagnosis in 2017. We calculated the recall and precision of each algorithm compared with a composite standard based on the agreement in matching decisions across all the algorithms and manual review. Results In simulations, BRL and fastLink maintained a high recall at nearly all data quality levels, while being comparable with deterministic algorithms in terms of precision. Deterministic algorithms typically failed to identify matches in scenarios with low data quality. All the deterministic algorithms had a shorter average computation time than the probabilistic algorithms. BRL had the slowest overall computation time (14 min when both datasets contained 2000 records). In the real-world scenario, BRL had the lowest trade-off between recall (309/309, 100.0%) and precision (309/312, 99.0%). Conclusions Probabilistic record linkage algorithms maximize the number of true matches identified, reducing gaps in the coverage of interventions and maximizing the reach of public health action.

Publisher

JMIR Publications Inc.

Subject

Public Health, Environmental and Occupational Health,Health Informatics

Reference32 articles.

1. World Health Organization20172020-02-24Public Health Surveillancehttps://www.who.int/topics/public_health_surveillance/en/

2. Monitoring Outcomes for Newly Diagnosed and Prevalent HIV Cases Using a Care Continuum Created With New York City Surveillance Data

3. Evaluation of the National Human Immunodeficiency Virus Surveillance System for the 2011 Diagnosis Year

4. Use of Multiple Data Sources and Individual Case Investigation to Refine Surveillance-Based Estimates of the HIV Care Continuum

5. Centers for Disease Control and PreventionCenters for Disease Control and Prevention20182020-02-24Monitoring Selected National HIV Prevention and Care Objectives by Using HIV Surveillance Data - United States and 6 Dependent Areas, 2016https://www.cdc.gov/hiv/pdf/library/reports/surveillance/cdc-hiv-surveillance-supplemental-report-vol-23-4.pdf

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Linking WIC program and HMO administrative data to study the impact of WIC participation;Children and Youth Services Review;2024-01

2. Record Linkage for Malaria Deaths Data Recovery and Surveillance in Brazil;Tropical Medicine and Infectious Disease;2023-12-14

3. Changes in Residential Greenspace and Birth Outcomes among Siblings: Differences by Maternal Race;International Journal of Environmental Research and Public Health;2023-09-21

4. Improving Probabilistic Record Linkage Using Statistical Prediction Models;International Statistical Review;2022-12-04

5. Enhancing Human Biomonitoring Studies through Linkage to Administrative Registers–Status in Europe;International Journal of Environmental Research and Public Health;2022-05-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3