Author:
Sadeq Mohammed Ahmed,Ghorab Reem Mohamed Farouk,Ashry Mohamed Hady,Abozaid Ahmed Mohamed,Banihani Haneen A.,Salem Moustafa,Aisheh Mohammed Tawfiq Abu,Abuzahra Saad,Mourid Marina Ramzy,Assker Mohamad Monif,Ayyad Mohammed,Moawad Mostafa Hossam El Din
Abstract
AbstractLarge language models (LLMs) like ChatGPT have potential applications in medical education such as helping students study for their licensing exams by discussing unclear questions with them. However, they require evaluation on these complex tasks. The purpose of this study was to evaluate how well publicly accessible LLMs performed on simulated UK medical board exam questions. 423 board-style questions from 9 UK exams (MRCS, MRCP, etc.) were answered by seven LLMs (ChatGPT-3.5, ChatGPT-4, Bard, Perplexity, Claude, Bing, Claude Instant). There were 406 multiple-choice, 13 true/false, and 4 "choose N" questions covering topics in surgery, pediatrics, and other disciplines. The accuracy of the output was graded. Statistics were used to analyze differences among LLMs. Leaked questions were excluded from the primary analysis. ChatGPT 4.0 scored (78.2%), Bing (67.2%), Claude (64.4%), and Claude Instant (62.9%). Perplexity scored the lowest (56.1%). Scores differed significantly between LLMs overall (p < 0.001) and in pairwise comparisons. All LLMs scored higher on multiple-choice vs true/false or “choose N” questions. LLMs demonstrated limitations in answering certain questions, indicating refinements needed before primary reliance in medical education. However, their expanding capabilities suggest a potential to improve training if thoughtfully implemented. Further research should explore specialty specific LLMs and optimal integration into medical curricula.
Funder
Misr University for Science & Technology
Publisher
Springer Science and Business Media LLC
Reference29 articles.
1. Ramesh, A., Kambhampati, C., Monson, J. & Drew, P. Artificial intelligence in medicine. Ann. R. Coll. Surg. Engl. 86(5), 334–338. https://doi.org/10.1308/147870804290 (2004).
2. McCarthy, J., Minsky, M. L., Rochester, N. & Shannon, C. E. A proposal for the dartmouth summer research project on artificial intelligence, August 31, 1955. AIMag 27(4), 12. https://doi.org/10.1609/aimag.v27i4.1904 (2006).
3. Mbakwe, A. B., Lourentzou, I., Celi, L. A., Mechanic, O. J. & Dagan, A. ChatGPT passing USMLE shines a spotlight on the flaws of medical education. PLOS Digit Health 2(2), e0000205. https://doi.org/10.1371/journal.pdig.0000205 (2023).
4. Dave, T., Athaluri, S. A. & Singh, S. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front. Artif. Intell. 6, 1169595. https://doi.org/10.3389/frai.2023.1169595 (2023).
5. Kelly, S. Microsoft opens up its AI-powered Bing to all users. CNN. [Online]. Available: https://edition.cnn.com/2023/05/04/tech/microsoft-bing-updates/index.html#:~:text=Bing%20now%20gets%20more%20than,features%20to%20its%20search%20engine.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献