Learning debiased graph representations from the OMOP common data model for synthetic data generation

Author:

Schulz Nicolas Alexander,Carus Jasmin,Wiederhold Alexander Johannes,Johanns Ole,Peters Frederik,Rath Natalie,Rausch Katharina,Holleczek Bernd,Katalinic Alexander, ,Nennecke Alice,Kusche Henrik,Heinrichs Vera,Eberle Andrea,Luttmann Sabine,Abnaof Khalid,Kim-Wanner Soo-Zin,Handels Heinz,Germer Sebastian,Halber Marco,Richter Martin,Pinnau Martin,Reiner David,Schaaf Jannik,Storf Holger,Hartz Tobias,Goeken Nils,Bösche Janina,Stein Alexandra,Weitmann Kerstin,Hoffmann Wolfgang,Labohm Louisa,Rudolph Christiane,Gundler Christopher,Ückert Frank,Gundler Christopher

Abstract

Abstract Background Generating synthetic patient data is crucial for medical research, but common approaches build up on black-box models which do not allow for expert verification or intervention. We propose a highly available method which enables synthetic data generation from real patient records in a privacy preserving and compliant fashion, is interpretable and allows for expert intervention. Methods Our approach ties together two established tools in medical informatics, namely OMOP as a data standard for electronic health records and Synthea as a data synthetization method. For this study, data pipelines were built which extract data from OMOP, convert them into time series format, learn temporal rules by 2 statistical algorithms (Markov chain, TARM) and 3 algorithms of causal discovery (DYNOTEARS, J-PCMCI+, LiNGAM) and map the outputs into Synthea graphs. The graphs are evaluated quantitatively by their individual and relative complexity and qualitatively by medical experts. Results The algorithms were found to learn qualitatively and quantitatively different graph representations. Whereas the Markov chain results in extremely large graphs, TARM, DYNOTEARS, and J-PCMCI+ were found to reduce the data dimension during learning. The MultiGroupDirect LiNGAM algorithm was found to not be applicable to the problem statement at hand. Conclusion Only TARM and DYNOTEARS are practical algorithms for real-world data in this use case. As causal discovery is a method to debias purely statistical relationships, the gradient-based causal discovery algorithm DYNOTEARS was found to be most suitable.

Funder

Universitätsklinikum Hamburg-Eppendorf (UKE)

Publisher

Springer Science and Business Media LLC

Reference26 articles.

1. Ellahham S, Ellahham N, Simsekler MCE. Application of Artificial Intelligence in the Health Care Safety Context Opportunities and Challenges. Am J Med Qual. 2020;35(4):341–8.

2. Abouelmehdi K, Beni-Hssane A, Khaloufi H, Saadi M. Big data security and privacy in healthcare: A Review. Procedia Comput Sci. 2017;113:73–80.

3. Barrows RC, Clayton PD. Privacy, Confidentiality, and Electronic Medical Records. J Am Med Inform Assoc. 1996;3(2):139–48.

4. Graumann S, Bertschek I, Weber T, Ebert M, Ohnemus J. Monitoring-Report Wirtschaft DIGITAL 2017-Kompakt [Internet]. 2017. Available from: https://ftp.zew.de/pub/zew-docs/gutachten/ZEW_MonitoringWirtschaftDigital2017Langfassung.pdf

5. OHDSI. The Book of OHDSI [Internet]. 2021. Available from: https://ohdsi.github.io/TheBookOfOhdsi/

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3