Author:
Swaminathan Akshay,Ren Alexander L.,Wu Janet Y.,Bhargava-Shah Aarohi,Lopez Ivan,Srivastava Ujwal,Alexopoulos Vassilis,Pizzitola Rebecca,Bui Brandon,Alkhani Layth,Lee Susan,Mohit Nathan,Seo Noel,Macedo Nicholas,Cheng Winson,Wang William,Tran Edward,Thomas Reena,Gevaert Olivier
Abstract
AbstractBackgroundData on lines of therapy (LOTs) for cancer treatment is important for clinical oncology research, but LOTs are not explicitly recorded in EHRs. We present an efficient approach for clinical data abstraction and a flexible algorithm to derive LOTs from EHR-based medication data on patients with glioblastoma (GBM).MethodsNon-clinicians were trained to abstract the diagnosis of GBM from EHRs, and their accuracy was compared to abstraction performed by clinicians. The resulting data was used to build a cohort of patients with confirmed GBM diagnosis. An algorithm was developed to derive LOTs using structured medication data, accounting for the addition and discontinuation of therapies and drug class. Descriptive statistics were calculated and time-to-next-treatment analysis was performed using the Kaplan-Meier method.ResultsTreating clinicians as the gold standard, non-clinicians abstracted GBM diagnosis with sensitivity 0.98, specificity 1.00, PPV 1.00, and NPV 0.90, suggesting that non-clinician abstraction of GBM diagnosis was comparable to clinician abstraction. Out of 693 patients with a confirmed diagnosis of GBM, 246 patients contained structured information about the types of medications received. Of those, 165 (67.1%) received a first-line therapy (1L) of temozolomide, and the median time-to-next-treatment from the start of 1L was 179 days.ConclusionsWe also developed a flexible, interpretable, and easy-to-implement algorithm to derive LOTs given EHR data on medication orders and administrations that can be used to create high-quality datasets for outcomes research. We also showed that the cost of chart abstraction can be reduced by training non-clinicians instead of clinicians.Importance of the studyThis study proposes an efficient and accurate method to extract unstructured data from electronic health records (EHRs) for cancer outcomes research. The study addresses the limitations of manual abstraction of unstructured clinical data and presents a reproducible, low-cost workflow for clinical data abstraction and a flexible algorithm to derive lines of therapy (LOTs) from EHR-based structured medication data. The LOT data was used to conduct a descriptive treatment pattern analysis and a time-to-next-treatment analysis to demonstrate how EHR-derived unstructured data can be transformed to answer diverse clinical research questions. The study also investigates the feasibility of training non-clinicians to perform abstraction of GBM data, demonstrating that with detailed explanations of clinical documentation, best practices for chart review, and quantitative evaluation of abstraction performance, similar data quality to abstraction performed by clinicians can be achieved. The findings of this study have important implications for improving cancer outcomes research and facilitating the analysis of EHR-derived treatment data.
Publisher
Cold Spring Harbor Laboratory