Evaluating the diagnostic performance of a large language model‐powered chatbot for providing immunohistochemistry recommendations in dermatopathology

Author:

McCrary Myles R.12ORCID,Galambus Justine23ORCID,Chen Wei‐Shen12

Affiliation:

1. Department of Anatomic and Clinical Pathology Morsani College of Medicine, University of South Florida Tampa Florida USA

2. Department of Dermatology and Cutaneous Surgery Morsani College of Medicine, University of South Florida Tampa Florida USA

3. Department of Internal Medicine University of Texas Medical Branch Galveston Texas USA

Abstract

AbstractBackgroundLarge language model (LLM)‐powered chatbots such as ChatGPT have numerous applications. However, their effectiveness in dermatopathology has not been formally evaluated. Dermatopathological cases often require immunohistochemical workup. Here, we evaluate the performance of a chatbot in providing diagnostically useful information on immunohistochemistry relating to dermatological diseases.MethodsWe queried a commonly used chatbot for the immunophenotypes of 51 cutaneous diseases, including a diverse variety of epidermal, adnexal, hematolymphoid, and soft tissue entities. We requested it to provide references for each diagnosis. All tests were repeated, compiled, quantified, and then compared with established literature standards.ResultsClustering analysis demonstrated that recommendations correlated with tumor type, suggesting chatbots can supply appropriate panels. However, a significant portion of recommendations were factually incorrect (13.9%). Citations were rarely clinically useful (24.5%). Many were confabulated (27.2%). Prompt responses for cutaneous adnexal lesions tended to be less accurate while literature references were less useful. Reference retrieval performance was associated with the number of PubMed entries per entity.ConclusionsThis foundational study suggests that LLM‐powered chatbots may be useful for generating immunohistochemical panels for dermatologic diagnoses. However, specific performance capabilities and biases must be considered. In addition, extreme caution is advised regarding the tendencies to fabricate material. Future models intentionally fine‐tuned to augment diagnostic medicine may prove to be valuable.

Publisher

Wiley

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3