Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine-Reference-Cited by-同舟云学术

Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine

Published:2024-01-24 Issue:1 Volume:7 Page:
ISSN:2398-6352
Container-title:npj Digital Medicine
language:en
Short-container-title:npj Digit. Med.

Author:

Savage Thomas^ORCID,Nayak Ashwin^ORCID,Gallo Robert,Rangan Ekanath^ORCID,Chen Jonathan H.^ORCID

Abstract

AbstractOne of the major barriers to using large language models (LLMs) in medicine is the perception they use uninterpretable methods to make clinical decisions that are inherently different from the cognitive processes of clinicians. In this manuscript we develop diagnostic reasoning prompts to study whether LLMs can imitate clinical reasoning while accurately forming a diagnosis. We find that GPT-4 can be prompted to mimic the common clinical reasoning processes of clinicians without sacrificing diagnostic accuracy. This is significant because an LLM that can imitate clinical reasoning to provide an interpretable rationale offers physicians a means to evaluate whether an LLMs response is likely correct and can be trusted for patient care. Prompting methods that use diagnostic reasoning have the potential to mitigate the “black box” limitations of LLMs, bringing them one step closer to safe and effective use in medicine.

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41746-024-01010-1.pdf

Reference28 articles.

1. Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1–11 (2023).

2. Lee, P., Bubeck, S. & Petro, J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N. Engl. J. Med. 388, 2399–2400 (2023).

3. Nayak, A. et al. Comparison of history of present illness summaries generated by a chatbot and senior internal medicine residents. JAMA Intern. Med. 183, e232561 (2023).

4. Kung, T. H. et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit. Health 2, e0000198 (2023).

5. Ayers, J. W. et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med. 183, 589–596 (2023).

Cited by 29 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis;Journal of Healthcare Informatics Research;2024-09-14

2. Prompt Engineering Paradigms for Medical Applications: Scoping Review;Journal of Medical Internet Research;2024-09-10

3. Large language model non-compliance with FDA guidance for clinical decision support devices;2024-09-09

4. Structured Clinical Reasoning Prompt Enhances LLM’s Diagnostic Capabilities in Diagnosis Please Quiz Cases;2024-09-03

5. Evaluating large language models on medical, lay-language, and self-reported descriptions of genetic conditions;The American Journal of Human Genetics;2024-09