Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive Survey

Author:

Xing Xiaodan1,Wu Huanjun1,Wang Lichao1,Stenson Iain2,Yong May2,Ser Javier Del3,Walsh Simon1,Yang Guang1

Affiliation:

1. Imperial College London, UK

2. Alan Turing Institute, UK

3. TECNALIA, Basque Research & Technology Alliance (BRTA), Spain

Abstract

Data quality is a key factor in the development of trustworthy AI in healthcare. A large volume of curated datasets with controlled confounding factors can improve the accuracy, robustness, and privacy of downstream AI algorithms. However, access to high-quality datasets is limited by the technical difficulties of data acquisition, and large-scale sharing of healthcare data is hindered by strict ethical restrictions. Data synthesis algorithms, which generate data with distributions similar to real clinical data, can serve as a potential solution to address the scarcity of good quality data during the development of trustworthy AI. However, state-of-the-art data synthesis algorithms, especially deep learning algorithms, focus more on imaging data while neglecting the synthesis of non-imaging healthcare data, including clinical measurements, medical signals and waveforms, and electronic healthcare records (EHRs). Therefore, in this paper, we will review synthesis algorithms, particularly for non-imaging medical data, with the aim of providing trustworthy AI in this domain. This tutorial-style review paper will provide comprehensive descriptions of non-imaging medical data synthesis, covering aspects such as algorithms, evaluations, limitations, and future research directions.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Theoretical Computer Science

Reference189 articles.

1. Martin Abadi , Andy Chu , Ian Goodfellow , H Brendan McMahan , Ilya Mironov , Kunal Talwar , and Li Zhang . 2016 . Deep learning with differential privacy . In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. Association for Computing Machinery , Vienna, Austria, 308–318. Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. Association for Computing Machinery, Vienna, Austria, 308–318.

2. John  M. Abowd and Julia Lane . 2004 . New Approaches to Confidentiality Protection : Synthetic Data, Remote Access and Research Data Centers . In Privacy in Statistical Databases, Josep Domingo-Ferrer and Vicenç Torra (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 282–289. John M. Abowd and Julia Lane. 2004. New Approaches to Confidentiality Protection: Synthetic Data, Remote Access and Research Data Centers. In Privacy in Statistical Databases, Josep Domingo-Ferrer and Vicenç Torra (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 282–289.

3. Babak Afshin-Pour , Hamid Soltanian-Zadeh , Gholam-Ali Hossein-Zadeh , Cheryl  L Grady , and Stephen  C Strother . 2011. A mutual information-based metric for evaluation of fMRI data-processing approaches. Human brain mapping 32, 5 ( 2011 ), 699–715. Babak Afshin-Pour, Hamid Soltanian-Zadeh, Gholam-Ali Hossein-Zadeh, Cheryl L Grady, and Stephen C Strother. 2011. A mutual information-based metric for evaluation of fMRI data-processing approaches. Human brain mapping 32, 5 (2011), 699–715.

4. Martin Arjovsky , Soumith Chintala , and Léon Bottou . 2017 . Wasserstein Generative Adversarial Networks . In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.  70) , Doina Precup and Yee Whye Teh (Eds.). PMLR, Sydney, Australia, 214–223. https://proceedings.mlr.press/v70/arjovsky17a.html Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.  70), Doina Precup and Yee Whye Teh (Eds.). PMLR, Sydney, Australia, 214–223. https://proceedings.mlr.press/v70/arjovsky17a.html

5. Reducing patient re-identification risk for laboratory results within research datasets;Atreya V;Journal of the American Medical Informatics Association,2013

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3