Accelerated curation of checkpoint inhibitor-induced colitis cases from electronic health records

Author:

Rahman Protiva1ORCID,Ye Cheng1,Mittendorf Kathleen F2,Lenoue-Newton Michele2,Micheel Christine2ORCID,Wolber Jan3,Osterman Travis2,Fabbri Daniel1

Affiliation:

1. Biomedical Informatics, Vanderbilt University Medical Center , Nashville, Tennessee, USA

2. Vanderbilt Ingram Cancer Center, Vanderbilt University Medical Center , Nashville, Tennessee, USA

3. Pharmaceutical Diagnostics, GE Healthcare , Chalfont St Giles, UK

Abstract

AbstractObjectiveAutomatically identifying patients at risk of immune checkpoint inhibitor (ICI)-induced colitis allows physicians to improve patientcare. However, predictive models require training data curated from electronic health records (EHR). Our objective is to automatically identify notes documenting ICI-colitis cases to accelerate data curation.Materials and MethodsWe present a data pipeline to automatically identify ICI-colitis from EHR notes, accelerating chart review. The pipeline relies on BERT, a state-of-the-art natural language processing (NLP) model. The first stage of the pipeline segments long notes using keywords identified through a logistic classifier and applies BERT to identify ICI-colitis notes. The next stage uses a second BERT model tuned to identify false positive notes and remove notes that were likely positive for mentioning colitis as a side-effect. The final stage further accelerates curation by highlighting the colitis-relevant portions of notes. Specifically, we use BERT’s attention scores to find high-density regions describing colitis.ResultsThe overall pipeline identified colitis notes with 84% precision and reduced the curator note review load by 75%. The segment BERT classifier had a high recall of 0.98, which is crucial to identify the low incidence (<10%) of colitis.DiscussionCuration from EHR notes is a burdensome task, especially when the curation topic is complicated. Methods described in this work are not only useful for ICI colitis but can also be adapted for other domains.ConclusionOur extraction pipeline reduces manual note review load and makes EHR data more accessible for research.

Funder

GE Healthcare and Vanderbilt University Medical Center

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3