Structured Clinical Reasoning Prompt Enhances LLM’s Diagnostic Capabilities in Diagnosis Please Quiz Cases

Author:

Sonoda YukiORCID,Kurokawa RyoORCID,Hagiwara AkifumiORCID,Asari YusukeORCID,Fukushima Takahiro,Kanzawa JunORCID,Gonoi WataruORCID,Abe OsamuORCID

Abstract

AbstractBackgroundLarge Language Models (LLMs) show promise in medical diagnosis, but their performance varies with prompting. Recent studies suggest that modifying prompts may enhance diagnostic capabilities.ObjectiveThis study aimed to test whether a prompting approach that aligns with general clinical reasoning methodology—specifically, separating processes of summarizing clinical information and making diagnoses based on the summary instead of one-step processing—can enhance LLM’s medical diagnostic capabilitiesMethods322 quiz questions fromRadiology’sDiagnosis Please cases (1998-2023) were used. We employed Claude 3.5 Sonnet, a state-of-the-art LLM, to compare three approaches: 1) Conventional zero-shot chain-of-thought prompt, as a baseline, 2) two-step approach: LLM organizes patient history and imaging findings, then provides diagnoses, and 3) Summary-only approach: Using only the LLM-generated summary for diagnoses.ResultsThe two-step approach significantly outperformed both baseline and summary-only methods in diagnosis accuracy, as determined by McNemar tests. Primary diagnosis accuracy was 60.6% for the two-step approach, compared to 56.5% for baseline (p=0.042) and 56.3% for summary-only (p=0.035). For the top three diagnoses, accuracy was 70.5%, 66.5%, and 65.5% respectively (p=0.005 for baseline, p=0.008 for summary-only). No significant differences were observed between baseline and summary-only approaches.ConclusionOur results indicate that a structured clinical reasoning approach enhances LLM’s diagnostic accuracy. This method shows potential as a valuable tool for deriving diagnoses from free-text clinical information. The approach aligns well with established clinical reasoning processes, suggesting its potential applicability in real-world clinical settings.

Publisher

Cold Spring Harbor Laboratory

Reference20 articles.

1. Rajpurkar P. AI in health and medicine. Nature Medicine. 2022;28.

2. Large language models encode clinical knowledge

3. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine

4. Xiong G , Jin Q , Wang X , Zhang M , Lu Z , Zhang A. Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions. arXiv; 2024. http://arxiv.org/abs/2408.00727. Accessed August 27, 2024.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3