Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System (Preprint)

Author:

Miyaji AtsukoORCID,Watanabe KanameORCID,Takano YuukiORCID,Nakasho KazuhisaORCID,Nakamura ShoORCID,Wang YuntaoORCID,Narimatsu HirotoORCID

Abstract

BACKGROUND

By integrating data corresponding to individuals between databases managed by different institutions, big data useful for epidemiological research can be obtained. It is a requirement that privacy information is protected while performing efficient data matching at a high level.

OBJECTIVE

Privacy-Preserving Distributed Data Integration (PDDI) is a technology that enables data matching between multiple databases without moving privacy information. It is necessary to consider errors in matching keys; therefore, we conducted a basic matching experiment using a model to assess accuracy of cancer screening.

METHODS

We created a dataset that mimics the data of cancer screening and registration in Japan and conducted a matching experiment using a PDDI system between geographically distant institutions. Errors similar to those found empirically in data sets recorded in Japanese were artificially introduced into the dataset. The matching-key error rate of the data common to both datasets was set sufficiently higher than expected in the actual database: 85.0% and 59.0% for the data simulating colorectal and breast cancer, respectively. Various combinations of name, gender, date of birth, and address were used for the matching key. To evaluate the matching accuracy, the matching sensitivity and specificity were calculated based on the number of cancer screening data points, and the effect of the matching accuracy on the sensitivity and specificity of the cancer screening was estimated based on the obtained values. To evaluate the performance, we measured CPU usage, memory usage, and network traffic.

RESULTS

For combinations with a specificity of 99% or higher and high sensitivity, the date of birth and first name were used in the data simulating colorectal cancer, and the matching sensitivity and specificity were 55.00% and 99.85%, respectively. In the data simulating breast cancer, the date of birth and family name were used, and the matching sensitivity and specificity were 88.71% and 99.98%, respectively. Assuming the sensitivity and specificity of cancer screening at 90%, the apparent values decreased to 74.90% and 89.93%, respectively. A trial calculation was performed using a combination with the same data set and a specificity of 100%. When the matching sensitivity was 82.26%, the apparent screening sensitivity maintained at 90% and the screening specificity dropped to 89.89% with a small error from the original value. For 214 (16,384) datapoints, the execution time was 82 minutes and 26 seconds without parallelization and 11 minutes and 38 seconds with parallelization; 19.33% of the calculation time was for the data-holding institutions. Memory usage was 3.4 GB for the PDDI server and 2.7 GB for data-holding institutions.

CONCLUSIONS

We demonstrated the rudimentary feasibility of introducing a PDDI system for cancer screening accuracy assessment. We plan to carry out matching experiments based on actual data and comparisons with existing methods.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3