Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula-Reference-Cited by-同舟云学术

Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula

Published:2023-05-09 Issue:08 Volume:195 Page:713-719
ISSN:1438-9029
Container-title:RöFo - Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren
language:de
Short-container-title:Rofo

Author:

Dewald Cornelia L.A.¹^ORCID,Balandis Alina²,Becker Lena S.¹,Hinrichs Jan B.¹,von Falck Christian¹,Wacker Frank K.¹,Laser Hans²,Gerbel Svetlana²,Winther Hinrich B.¹,Apfel-Starke Johanna²

Affiliation:

1. Institute for Diagnostic and Interventional Radiology, Hannover Medical School, Hannover, Germany

2. Centre for Information Management (ZIMt), Hannover Medical School, Hannover, Germany

Abstract

Purpose Radiology reports mostly contain free-text, which makes it challenging to obtain structured data. Natural language processing (NLP) techniques transform free-text reports into machine-readable document vectors that are important for creating reliable, scalable methods for data analysis. The aim of this study is to classify unstructured radiograph reports according to fractures of the distal fibula and to find the best text mining method. Materials & Methods We established a novel German language report dataset: a designated search engine was used to identify radiographs of the ankle and the reports were manually labeled according to fractures of the distal fibula. This data was used to establish a machine learning pipeline, which implemented the text representation methods bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), principal component analysis (PCA), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), and document embedding (doc2vec). The extracted document vectors were used to train neural networks (NN), support vector machines (SVM), and logistic regression (LR) to recognize distal fibula fractures. The results were compared via cross-tabulations of the accuracy (acc) and area under the curve (AUC). Results In total, 3268 radiograph reports were included, of which 1076 described a fracture of the distal fibula. Comparison of the text representation methods showed that BOW achieved the best results (AUC = 0.98; acc = 0.97), followed by TF-IDF (AUC = 0.97; acc = 0.96), NMF (AUC = 0.93; acc = 0.92), PCA (AUC = 0.92; acc = 0.9), LDA (AUC = 0.91; acc = 0.89) and doc2vec (AUC = 0.9; acc = 0.88). When comparing the different classifiers, NN (AUC = 0,91) proved to be superior to SVM (AUC = 0,87) and LR (AUC = 0,85). Conclusion An automated classification of unstructured reports of radiographs of the ankle can reliably detect findings of fractures of the distal fibula. A particularly suitable feature extraction method is the BOW model. Key Points: Citation Format

Publisher

Georg Thieme Verlag KG

Subject

Radiology, Nuclear Medicine and imaging

Link

http://www.thieme-connect.de/products/ejournals/pdf/10.1055/a-2061-6562.pdf

Reference21 articles.

1. Caveats for the Use of Operational Electronic Health Record Data in Comparative Effectiveness Research;W R Hersh;Med Care,2013

2. Achieving a nationwide learning health system;C P Friedman;Sci Transl Med,2010

3. The “meaningful use” regulation for electronic health records;D Blumenthal;New England Journal of Medicine,2010

4. Identification of long bone fractures in radiology reports using natural language processing to support healthcare quality improvement;R W Grundmeier;Applied clinical informatics,2016

5. Natural language processing in radiology: a systematic review;E Pons;Radiology,2016