Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer-Reference-Cited by-同舟云学术

Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer

Published:2022-11-21 Issue: Volume:12 Page:
ISSN:2234-943X
Container-title:Frontiers in Oncology
language:
Short-container-title:Front. Oncol.

Author:

Liu Wenjuan,Zhang Xi,Lv Han,Li Jia,Liu Yawen,Yang Zhenghan,Weng Xutao,Lin Yucong,Song Hong,Wang Zhenchang

Abstract

BackgroundMedical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports.ObjectiveThe purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC).MethodsRadiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies.ResultsThe dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning.ConclusionsThe learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.

Publisher

Frontiers Media SA

Subject

Cancer Research,Oncology

Reference33 articles.

1. Hepatic parenchymal preservation surgery: Decreasing morbidity and mortality rates in 4,152 resections for malignancy;Kingham;J Am Coll Surg.,2015

2. Accurate identification of fatty liver disease in data warehouse utilizing natural language processing;Redman;Digest Dis Sci,2017

3. Natural language-based machine learning models for the annotation of clinical radiology reports;Zech;Radiology,2018

4. Can natural language processing help differentiate inflammatory intestinal diseases in China? models applying random forest and convolutional neural network approaches;Tong;BMC Med Inform Decis Mak,2020

5. Combining text mining and sequence analysis to discover protein functional regions;Eskin;Pac Symp Biocomput,2004

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comprehensive Review of Multimodal Medical Data Analysis: Open Issues and Future Research Directions;Acta Informatica Pragensia;2022-12-26