Affiliation:
1. Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
2. Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
Abstract
Background: OpenAI’s ChatGPT (San Francisco, CA, USA) and Google’s Gemini (Mountain View, CA, USA) are two large language models that show promise in improving and expediting medical decision making in hand surgery. Evaluating the applications of these models within the field of hand surgery is warranted. This study aims to evaluate ChatGPT-4 and Gemini in classifying hand injuries and recommending treatment. Methods: Gemini and ChatGPT were given 68 fictionalized clinical vignettes of hand injuries twice. The models were asked to use a specific classification system and recommend surgical or nonsurgical treatment. Classifications were scored based on correctness. Results were analyzed using descriptive statistics, a paired two-tailed t-test, and sensitivity testing. Results: Gemini, correctly classifying 70.6% hand injuries, demonstrated superior classification ability over ChatGPT (mean score 1.46 vs. 0.87, p-value < 0.001). For management, ChatGPT demonstrated higher sensitivity in recommending surgical intervention compared to Gemini (98.0% vs. 88.8%), but lower specificity (68.4% vs. 94.7%). When compared to ChatGPT, Gemini demonstrated greater response replicability. Conclusions: Large language models like ChatGPT and Gemini show promise in assisting medical decision making, particularly in hand surgery, with Gemini generally outperforming ChatGPT. These findings emphasize the importance of considering the strengths and limitations of different models when integrating them into clinical practice.
Reference60 articles.
1. Insights and trends review: Artificial intelligence in hand surgery;Miller;J. Hand Surg. Eur. Vol.,2023
2. High-performance medicine: The convergence of human and artificial intelligence;Topol;Nat. Med.,2019
3. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations;Dave;Front. Artif. Intell.,2023
4. How Efficient Is ChatGPT in Accessing Accurate and Quality Health-Related Information?;Ulusoy;Cureus,2023
5. Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., and Khudanpur, S. (2010). Interspeech, ISCA.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献