A retrospective study of deep learning generalization across two centers and multiple models of X-ray devices using COVID-19 chest-X rays-Reference-Cited by-同舟云学术

A retrospective study of deep learning generalization across two centers and multiple models of X-ray devices using COVID-19 chest-X rays

Published:2024-06-25 Issue:1 Volume:14 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Fernández-Miranda Pablo Menéndez^ORCID,Fraguela Enrique Marqués,de Linera-Alperi Marta Álvarez^ORCID,Cobo Miriam^ORCID,del Barrio Amaia Pérez^ORCID,González David Rodríguez^ORCID,Vega José A.^ORCID,Iglesias Lara Lloret^ORCID

Abstract

AbstractGeneralization of deep learning (DL) algorithms is critical for the secure implementation of computer-aided diagnosis systems in clinical practice. However, broad generalization remains to be a challenge in machine learning. This research aims to identify and study potential factors that can affect the internal validation and generalization of DL networks, namely the institution where the images come from, the image processing applied by the X-ray device, and the type of response function of the X-ray device. For these purposes, a pre-trained convolutional neural network (CNN) (VGG16) was trained three times for classifying COVID-19 and control chest radiographs with the same hyperparameters, but using different combinations of data acquired in two institutions by three different X-ray device manufacturers. Regarding internal validation, the addition of images from an external institution to the training set did not modify the algorithm’s internal performance, however, the inclusion of images acquired by a device from a different manufacturer decreased the performance up to 8% (p < 0.05). In contrast, generalization across institutions and X-ray devices with the same type of response function was achieved. Nonetheless, generalization was not observed across devices with different types of response function. This factor was the key impediment to achieving broad generalization in our research, followed by the device’s image-processing and the inter-institutional differences, which both reduced generalization performance to 18.9% (p < 0.05), and 9.8% (p < 0.05), respectively. Finally, clustering analysis with features extracted by the CNN was performed, revealing a substantial dependence of feature values extracted by the pre-trained CNN on the X-ray device which acquired the images.

Funder

Consejo Superior de Investigaciones Científicas

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41598-024-64941-5.pdf

Reference30 articles.

1. Borghesi, A. & Roberto, M. Covid-19 outbreak in Italy: Experimental chest x-ray scoring system for quantifying and monitoring disease progression. Radiol. Med. 125, 509–513 (2020).

2. Al Aseri, Z. Accuracy of chest radiograph interpretation by emergency physicians. Emerg. Radiol. 16, 111–114 (2009).

3. Hwang, E. J. et al. Deep learning for chest radiograph diagnosis in the emergency department. Radiology 293, 573–580 (2019).

4. Irvin J et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. Proceedings of the AAAI Conference on Artificial Intelligence. 33, 590–597. https://stanfordmlgroup.github.io/competitions/chexpert/. (2019). Accessed 12 March 2022.

5. Johnson, A. E. W. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019).