Extracting forced vital capacity from the electronic health record through natural language processing in rheumatoid arthritis‐associated interstitial lung disease

Author:

England Bryant R.1ORCID,Roul Punyasha1,Yang Yangyuna1,Hershberger Daniel2,Sayles Harlan3,Rojas Jorge4,Cannon Grant W.5,Sauer Brian C.5,Curtis Jeffrey R.6,Baker Joshua F.7,Mikuls Ted R.1

Affiliation:

1. VA Nebraska‐Western Iowa Health Care System & Division of Rheumatology & Immunology University of Nebraska Medical Center Omaha Nebraska USA

2. Division of Pulmonary, Critical Care, and Sleep Medicine University of Nebraska Medical Center Omaha Nebraska USA

3. Department of Biostatistics University of Nebraska Medical Center Omaha Nebraska USA

4. Seattle VA Seattle Washington USA

5. Division of Rheumatology VA Salt Lake City & University of Utah Salt Lake City Utah USA

6. Division of Clinical Immunology and Rheumatology University of Alabama at Birmingham Birmingham Alabama USA

7. Division of Rheumatology Corporal Michael J. Crescenz VA & University of Pennsylvania Philadelphia Pennsylvania USA

Abstract

AbstractPurposeTo develop a natural language processing (NLP) tool to extract forced vital capacity (FVC) values from electronic health record (EHR) notes in patients with rheumatoid arthritis‐interstitial lung disease (RA‐ILD).MethodsWe selected RA‐ILD patients (n = 7485) in the Veterans Health Administration (VA) between 2000 and 2020 using validated ICD‐9/10 codes. We identified numeric values in proximity to FVC string patterns from clinical notes in the EHR. Subsequently, we performed processing steps to account for variability in note structure, related pulmonary function test (PFT) output, and values copied across notes, then assigned dates from linked administrative procedure records. NLP‐derived FVC values were compared to values recorded directly from PFT equipment available on a subset of patients.ResultsWe identified 5911 FVC values (n = 1844 patients) from PFT equipment and 15 383 values (n = 4982 patients) by NLP. Among 2610 date‐matched FVC values from NLP and PFT equipment, 95.8% of values were within 5% predicted. The mean (SD) difference was 0.09% (5.9), and values strongly correlated (r = 0.94, p < 0.001), with a precision of 0.87 (95% CI 0.86, 0.88). NLP captured more patients with longitudinal FVC values (n = 3069 vs. n = 1164). Mean (SD) change in FVC %‐predicted per year was similar between sources (−1.5 [30.0] NLP vs. −0.9 [16.6] PFT equipment; standardized response mean = 0.05 for both).ConclusionsNLP of EHR notes increases the capture of accurate, longitudinal FVC values by three‐fold over PFT equipment. Use of this NLP tool can facilitate pharmacoepidemiologic research in RA‐ILD and other lung diseases by capturing this critical measure of disease severity.

Funder

National Institute of Arthritis and Musculoskeletal and Skin Diseases

National Institute of General Medical Sciences

Rheumatology Research Foundation

U.S. Department of Defense

U.S. Department of Veterans Affairs

Publisher

Wiley

Subject

Pharmacology (medical),Epidemiology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3