Can We Trust Synthetic Data in Medicine? A Scoping Review of Privacy and Utility Metrics

Author:

Kaabachi Bayrem,Despraz Jérémie,Meurers Thierry,Otte Karen,Halilovic Mehmed,Prasser Fabian,Raisaro Jean Louis

Abstract

ABSTRACTIntroductionSharing and re-using health-related data beyond the scope of its initial collection is essential for accelerating research, developing robust and trustworthy machine learning algorithms methods that can be translated into clinical settings. The sharing of synthetic data, artificially generated to resemble real patient data, is increasingly recognized as a promising means to enable such a re-use while addressing the privacy concerns related to personal medical data. Nonetheless, no consensus exists yet on a standard approach for systematically and quantitatively evaluating the actual privacy gain and residual utility of synthetic data, de-facto hindering its adoption.ObjectiveIn this work, we present and systematize current knowledge on the field of synthetic health-related data evaluation both in terms of privacy and utility. We provide insights and critical analysis into the current state of the art and propose concrete directions and steps forward for the research community.MethodsWe assess and contextualize existing knowledge in the field through a scoping review and the creation of a common ontology that encompasses all the methods and metrics used to assess synthetic data. We follow the PRISMA-ScR methodology in order to perform data collection and knowledge synthesis.ResultsWe include 92 studies in the scoping review. We analyze and classify them according to the proposed ontology. We found 48 different methods to evaluate the residual statistical utility of synthetic data and 9 methods that are used to evaluate the residual privacy risks. Moreover, we observe that there is currently no consensus among researchers regarding neither individual metrics nor family of metrics for evaluating the privacy and utility of synthetic data. Our findings on the privacy of synthetic data show that there is an alarming tendency to trust the safety of synthetic data without properly evaluating it.ConclusionAlthough the use of synthetic data in healthcare promises to offer an easy and hassle-free alternative to real data, the lack of consensus in terms of evaluation hinders the adoption of this new technology. We believe that, by raising awareness and providing a comprehensive taxonomy on evaluation methods that takes into account the current state of literature, our work can foster the development and adoption of uniform approaches and consequently facilitate the use of synthetic data in the medical domain.

Publisher

Cold Spring Harbor Laboratory

Reference122 articles.

1. Precision Medicine, AI, and the Future of Personalized Health Care

2. Privacy Protection and Secondary Use of Health Data: Strategies and Methods

3. Privacy | HHS.gov. https://www.hhs.gov/hipaa/for-professionals/privacy/index.html.

4. General Data Protection Regulation (GDPR) – Official Legal Text. https://gdpr-info.eu/.

5. EMA. External guidance on the implementation of European Medicines Agency policy publication clinical data for medicinal products human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3