ChatGPT-3.5 and ChatGPT-4 dermatological knowledge level based on the Specialty Certificate Examination in Dermatology-Reference-Cited by-同舟云学术

ChatGPT-3.5 and ChatGPT-4 dermatological knowledge level based on the Specialty Certificate Examination in Dermatology

Published:2023-08-04 Issue: Volume: Page:
ISSN:0307-6938
Container-title:Clinical and Experimental Dermatology
language:en
Short-container-title:

Author:

Lewandowski Miłosz¹^ORCID,Łukowicz Paweł²,Świetlik Dariusz²,Barańska-Rybak Wioletta¹

Affiliation:

1. Department of Dermatology, Venereology and Allergology, Faculty of Medicine

2. Division of Biostatistics and Neural Networks, Medical University of Gdansk , Gdansk , Poland

Abstract

Abstract Background The global use of artificial intelligence (AI) has the potential to revolutionize the healthcare industry. Despite the fact that AI is becoming more popular, there is still a lack of evidence on its use in dermatology. Objectives To determine the capacity of ChatGPT-3.5 and ChatGPT-4 to support dermatology knowledge and clinical decision-making in medical practice. Methods Three Specialty Certificate Examination in Dermatology tests, in English and Polish, consisting of 120 single-best-answer, multiple-choice questions each, were used to assess the performance of ChatGPT-3.5 and ChatGPT-4. Results ChatGPT-4 exceeded the 60% pass rate in every performed test, with a minimum of 80% and 70% correct answers for the English and Polish versions, respectively. ChatGPT-4 performed significantly better on each exam (P < 0.01), regardless of language, compared with ChatGPT-3.5. Furthermore, ChatGPT-4 answered clinical picture-type questions with an average accuracy of 93.0% and 84.2% for questions in English and Polish, respectively. The difference between the tests in Polish and English were not significant; however, ChatGPT-3.5 and ChatGPT-4 performed better overall in English than in Polish by an average of 8 percentage points for each test. Incorrect ChatGPT answers were highly correlated with a lower difficulty index, denoting questions of higher difficulty in most of the tests (P < 0.05). Conclusions The dermatology knowledge level of ChatGPT was high, and ChatGPT-4 performed significantly better than ChatGPT-3.5. Although the use of ChatGPT will not replace a doctor’s final decision, physicians should support the development of AI in dermatology to raise the standards of medical care.

Publisher

Oxford University Press (OUP)

Subject

Dermatology

Link

https://academic.oup.com/ced/advance-article-pdf/doi/10.1093/ced/llad255/51555238/llad255.pdf

Reference22 articles.

1. Will ChatGPT transform healthcare?;Nature Med,2023

2. Human- versus artificial intelligence;Korteling;Front Artif Intell,2021

3. GPT-3: its nature, scope, limits, and consequences;Floridi;Minds Mach (Dordr),2020

Cited by 32 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery;Computational and Structural Biotechnology Journal;2024-12

2. Assessment Study of ChatGPT-3.5’s Performance on the Final Polish Medical Examination: Accuracy in Answering 980 Questions;Healthcare;2024-08-16

3. Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study;JMIR Medical Education;2024-08-13

4. The accuracy of AI-assisted chatbots on the annual assessment test for emergency medicine residents;Journal of Medicine, Surgery, and Public Health;2024-08

5. A comparative analysis of the performance of chatGPT4, Gemini and Claude for the Polish Medical Final Diploma Exam and Medical-Dental Verification Exam;2024-07-29