Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models-Reference-Cited by-同舟云学术

Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models

Published:2021-01-21 Issue:1 Volume:4 Page:
ISSN:2398-6352
Container-title:npj Digital Medicine
language:en
Short-container-title:npj Digit. Med.

Author:

Young Albert T.^ORCID,Fernandez Kristen,Pfau Jacob,Reddy Rasika,Cao Nhat Anh,von Franque Max Y.,Johal Arjun,Wu Benjamin V.,Wu Rachel R.,Chen Jennifer Y.,Fadadu Raj P.^ORCID,Vasquez Juan A.,Tam Andrew,Keiser Michael J.^ORCID,Wei Maria L.^ORCID

Abstract

AbstractArtificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness.

Funder

Melanoma Research Alliance

UCSF Helen Diller Family Comprehensive Cancer Center

UCSF Summer Explore Fellowship, Marguerite Schoeneman Award, Alameda-Contra Costa Medical Association Summer Fellowship, UCSF/UCB Joint Medical Program Thesis Grant

Doris Duke Charitable Foundation

Publisher

Springer Science and Business Media LLC

Subject

Health Information Management,Health Informatics,Computer Science Applications,Medicine (miscellaneous)

Link

http://www.nature.com/articles/s41746-020-00380-6.pdf

Reference41 articles.

1. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

2. Liu, Y. et al. A deep learning system for differential diagnosis of skin diseases. Nat. Med. 26, 900–908 (2020).

3. Han, S. S. et al. Keratinocytic skin cancer detection on the face using region-based convolutional neural network. JAMA Dermatol. 156, 29–37 (2020).

4. Han, S.S. et al. Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders.J. Invest. Dermatol. 140, 1753–1761 (2020).