Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models-Reference-Cited by-同舟云学术

Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models

Published:2024-06-28 Issue:Supplement_1 Volume:40 Page:i119-i129
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Jeong Minbyul¹^ORCID,Sohn Jiwoong¹,Sung Mujeen²,Kang Jaewoo¹³^ORCID

Affiliation:

1. Department of Computer Science, Korea University , Seoul 02841, Republic of Korea

2. Department of Software Convergence, School of Computing, Kyung Hee University , Republic of Korea

3. AIGEN Sciences , Seoul 04778, Republic of Korea

Abstract

Abstract Summary Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Similarly, Self-BioRAG outperforms RAG by 8% Rouge-1 score in generating more proficient answers on two long-form question-answering benchmarks on average. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains. Availability and implementation Self-BioRAG is available at https://github.com/dmis-lab/self-biorag.

Funder

National Research Foundation of Korea

Ministry of Health & Welfare, Republic of Korea

Ministry of Science and ICT

Kyung Hee University

Institute of Information & Communications Technology Planning & Evaluation

MSIT

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/bioinformatics/article-pdf/40/Supplement_1/i119/58355028/btae238.pdf

Reference46 articles.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multimodal Large Language Model Passes Specialty Board Examination and Surpasses Human Test-Taker Scores: A Comparative Analysis Examining the Stepwise Impact of Model Prompting Strategies on Performance;2024-07-29