Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive Survey


Xing Xiaodan1,Wu Huanjun1,Wang Lichao1,Stenson Iain2,Yong May2,Ser Javier Del3,Walsh Simon1,Yang Guang1


1. Imperial College London, UK

2. Alan Turing Institute, UK

3. TECNALIA, Basque Research & Technology Alliance (BRTA), Spain


Data quality is a key factor in the development of trustworthy AI in healthcare. A large volume of curated datasets with controlled confounding factors can improve the accuracy, robustness, and privacy of downstream AI algorithms. However, access to high-quality datasets is limited by the technical difficulties of data acquisition, and large-scale sharing of healthcare data is hindered by strict ethical restrictions. Data synthesis algorithms, which generate data with distributions similar to real clinical data, can serve as a potential solution to address the scarcity of good quality data during the development of trustworthy AI. However, state-of-the-art data synthesis algorithms, especially deep learning algorithms, focus more on imaging data while neglecting the synthesis of non-imaging healthcare data, including clinical measurements, medical signals and waveforms, and electronic healthcare records (EHRs). Therefore, in this paper, we will review synthesis algorithms, particularly for non-imaging medical data, with the aim of providing trustworthy AI in this domain. This tutorial-style review paper will provide comprehensive descriptions of non-imaging medical data synthesis, covering aspects such as algorithms, evaluations, limitations, and future research directions.


Association for Computing Machinery (ACM)


General Computer Science,Theoretical Computer Science

Reference189 articles.

1. Martin Abadi , Andy Chu , Ian Goodfellow , H Brendan McMahan , Ilya Mironov , Kunal Talwar , and Li Zhang . 2016 . Deep learning with differential privacy . In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. Association for Computing Machinery , Vienna, Austria, 308–318. Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. Association for Computing Machinery, Vienna, Austria, 308–318.

2. John  M. Abowd and Julia Lane . 2004 . New Approaches to Confidentiality Protection : Synthetic Data, Remote Access and Research Data Centers . In Privacy in Statistical Databases, Josep Domingo-Ferrer and Vicenç Torra (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 282–289. John M. Abowd and Julia Lane. 2004. New Approaches to Confidentiality Protection: Synthetic Data, Remote Access and Research Data Centers. In Privacy in Statistical Databases, Josep Domingo-Ferrer and Vicenç Torra (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 282–289.

3. Babak Afshin-Pour , Hamid Soltanian-Zadeh , Gholam-Ali Hossein-Zadeh , Cheryl  L Grady , and Stephen  C Strother . 2011. A mutual information-based metric for evaluation of fMRI data-processing approaches. Human brain mapping 32, 5 ( 2011 ), 699–715. Babak Afshin-Pour, Hamid Soltanian-Zadeh, Gholam-Ali Hossein-Zadeh, Cheryl L Grady, and Stephen C Strother. 2011. A mutual information-based metric for evaluation of fMRI data-processing approaches. Human brain mapping 32, 5 (2011), 699–715.

4. Martin Arjovsky , Soumith Chintala , and Léon Bottou . 2017 . Wasserstein Generative Adversarial Networks . In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.  70) , Doina Precup and Yee Whye Teh (Eds.). PMLR, Sydney, Australia, 214–223. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol.  70), Doina Precup and Yee Whye Teh (Eds.). PMLR, Sydney, Australia, 214–223.

5. Reducing patient re-identification risk for laboratory results within research datasets;Atreya V;Journal of the American Medical Informatics Association,2013







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3