Using Recurrent Neural Networks to Extract High-Quality Information From Lung Cancer Screening Computerized Tomography Reports for Inter-Radiologist Audit and Feedback Quality Improvement

Author:

Zhang Yucheng1ORCID,Grant Benjamin M.M.2ORCID,Hope Andrew J.3ORCID,Hung Rayjean J.45ORCID,Warkentin Matthew T.45ORCID,Lam Andrew C.L.12ORCID,Aggawal Reenika12,Xu Maria2,Shepherd Frances A.12,Tsao Ming-Sound16,Xu Wei578,Pakkal Mini19,Liu Geoffrey125710,McInnis Micheal C.19ORCID

Affiliation:

1. Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada

2. Medical Oncology and Hematology, Princess Margaret Cancer Centre, Toronto, ON, Canada

3. Radiation Medicine Program, Princess Margaret Cancer Centre, and Department of Radiation Oncology, University of Toronto, Toronto, ON, Canada

4. Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health Systems, Toronto, ON, Canada

5. Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada

6. Laboratory Medicine and Pathology, University Health Network, Toronto, ON, Canada

7. Biostatistics, Princess Margaret Cancer Centre, Toronto, ON, Canada

8. Computational Biology and Medicine Program, Princess Margaret Cancer Centre, Toronto, ON, Canada

9. Division of Cardiothoracic Imaging, Joint Department of Medical Imaging, Toronto General Hospital, Toronto, ON, Canada

10. Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada

Abstract

PURPOSE Lung cancer screening programs generate a high volume of low-dose computed tomography (LDCT) reports that contain valuable information, typically in a free-text format. High-performance named-entity recognition (NER) models can extract relevant information from these reports automatically for inter-radiologist quality control. METHODS Using LDCT report data from a longitudinal lung cancer screening program (8,305 reports; 3,124 participants; 2006-2019), we trained a rule-based model and two bidirectional long short-term memory (Bi-LSTM) NER neural network models to detect clinically relevant information from LDCT reports. Model performance was tested using F1 scores and compared with a published open-source radiology NER model (Stanza) in an independent evaluation set of 150 reports. The top performing model was applied to a data set of 6,948 reports for an inter-radiologist quality control assessment. RESULTS The best performing model, a Bi-LSTM NER recurrent neural network model, had an overall F1 score of 0.950, which outperformed Stanza (F1 score = 0.872) and a rule-based NER model (F1 score = 0.809). Recall (sensitivity) for the best Bi-LSTM model ranged from 0.916 to 0.991 for different entity types; precision (positive predictive value) ranged from 0.892 to 0.997. Test performance remained stable across time periods. There was an average of a 2.86-fold difference in the number of identified entities between the most and the least detailed radiologists. CONCLUSION We built an open-source Bi-LSTM NER model that outperformed other open-source or rule-based radiology NER models. This model can efficiently extract clinically relevant information from lung cancer screening computerized tomography reports with high accuracy, enabling efficient audit and feedback to improve quality of patient care.

Publisher

American Society of Clinical Oncology (ASCO)

Subject

General Medicine

全球学者库

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"全球学者库"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前全球学者库共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2023 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3