Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study-Reference-Cited by-同舟云学术

Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study

Published:2022-01-20 Issue:1 Volume:10 Page:e28842
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Kumar Sajit^ORCID,Nanelia Alicia^ORCID,Mariappan Ragunathan^ORCID,Rajagopal Adithya^ORCID,Rajan Vaibhav^ORCID

Abstract

Background Patient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images, or graphs. Most previous techniques have used neural network–based autoencoders to learn patient representations, primarily from clinical notes in electronic medical records (EMRs). Knowledge graphs (KGs), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature and provide complementary information to EMR data that have been found to provide valuable predictive signals. Objective This study aims to evaluate the efficacy of collective matrix factorization (CMF), both the classical variant and a recent neural architecture called deep CMF (DCMF), in integrating heterogeneous data sources from EMR and KG to obtain patient representations for clinical decision support tasks. Methods Using a recent formulation for obtaining graph representations through matrix factorization within the context of CMF, we infused auxiliary information during patient representation learning. We also extended the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predictions. We compared the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluated patient representation learning using CMF-based methods and autoencoders for 2 clinical decision support tasks on a large EMR data set. Results Our experiments show that DCMF provides a seamless way for integrating multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable with that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous nonneural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance. Conclusions Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources and combine information from EMR data and KGs. Furthermore, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference53 articles.

1. Predicting Complications in Critical Care Using Heterogeneous Clinical Data

2. GhanvatkarSRajanVDeep recurrent neural networks for mortality prediction in intensive care using clinical time series at multiple resolutionsProceedings of the ICIS Conference 20192019ICIS conference 2019Dec 15-18, 2019Munich, Germany

3. Machine Learning Approaches for Early DRG Classification and Resource Allocation

4. Deep Learning to Predict Hospitalization at Triage: Integration of Structured Data and Unstructured Text

5. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Towards electronic health record-based medical knowledge graph construction, completion, and applications: A literature study;Journal of Biomedical Informatics;2023-07