Abstract
Background
The rise of artificial intelligence (AI) in medicine has revealed the potential of ChatGPT as a pivotal tool in medical diagnosis and treatment. This study assesses the efficacy of ChatGPT versions 3.5 and 4.0 in addressing renal cell carcinoma (RCC) clinical inquiries. Notably, fine-tuning and iterative optimization of the model corrected ChatGPT’s limitations in this area.
Methods
In our study, 80 RCC-related clinical questions from urology experts were posed three times to both ChatGPT 3.5 and ChatGPT 4.0, seeking binary (yes/no) responses. We then statistically analyzed the answers. Finally, we fine-tuned the GPT-3.5 Turbo model using these questions, and assessed its training outcomes.
Results
We found that the average accuracy rates of answers provided by ChatGPT versions 3.5 and 4.0 were 67.08% and 77.50%, respectively. ChatGPT 4.0 outperformed ChatGPT 3.5, with a higher accuracy rate in responses (p < 0.05). By counting the number of correct responses to the 80 questions, we then found that although ChatGPT 4.0 performed better (p < 0.05), both versions were subject to instability in answering. Finally, by fine-tuning the GPT-3.5 Turbo model, we found that the correct rate of responses to these questions could be stabilized at 93.75%. Iterative optimization of the model can result in 100% response accuracy.
Conclusion
We compared ChatGPT versions 3.5 and 4.0 in addressing clinical RCC questions, identifying their limitations. By applying the GPT-3.5 Turbo fine-tuned model iterative training method, we enhanced AI strategies in renal oncology. This approach is set to enhance ChatGPT’s database and clinical guidance capabilities, optimizing AI in this field.
Similar content being viewed by others
Availability of data and materials
All data generated and/or analyzed during this study are included in this published article.
References
Hamet P, Tremblay J. Artificial intelligence in medicine. Metab Clinic Exp. 2017;69(Suppl):S36–40. https://doi.org/10.1016/j.metabol.2017.01.011.
Tsakos E, Xydias EM, Ziogas AC, et al. Surgical and quality of life outcomes following robotic-assisted (da Vinci) laparoscopic repair of vesicovaginal fistula: a case report and video demonstration. Cureus. 2023;15(7):e42171. https://doi.org/10.7759/cureus.42171.
Sahu A, Mishra J, Kushwaha N. Artificial intelligence (AI) in drugs and pharmaceuticals. Comb Chem High Throughput Screen. 2022;25(11):1818–37. https://doi.org/10.2174/1386207325666211207153943.
Anta JA, Martinez-Ballestero I, Eiroa D, Garcia J, Rodriguez-Comas J. Artificial intelligence for the detection of pancreatic lesions. Int J Comput Assist Radiol Surg. 2022;17(10):1855–65. https://doi.org/10.1007/s11548-022-02706-z.
Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33. https://doi.org/10.1007/s10916-023-01925-4.
Moor M, Banerjee O, Abad ZSH, et al. Foundation models for generalist medical artificial intelligence. Nature. 2023;616(7956):259–65. https://doi.org/10.1038/s41586-023-05881-4.
OpenAI. ChatGPT: Optimizing language models for dialogue. Available at: https://openai.com/blog/chatgpt. Accessed 1 Sep 2023.
Yeo YH, Samaan JS, Ng WH, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023;29(3):721–32. https://doi.org/10.3350/cmh.2023.0089.
Eysenbach G. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ. 2023;9:e46885. https://doi.org/10.2196/46885.
Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. https://doi.org/10.2196/45312.
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930–40. https://doi.org/10.1038/s41591-023-02448-8.
Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin. 2023;73(1):17–48. https://doi.org/10.3322/caac.21763.
Chen DY, Uzzo RG. Evaluation and management of the renal mass. Med Clin North Am. 2011;95(1):179–89. https://doi.org/10.1016/j.mcna.2010.08.021.
Murai M, Oya M. Renal cell carcinoma: etiology, incidence and epidemiology. Curr Opin Urol. 2004;14(4):229–33. https://doi.org/10.1097/01.mou.0000135078.04721.f5.
Ljungberg B, Albiges L, Abu-Ghanem Y, et al. European association of urology guidelines on renal cell carcinoma: the 2022 update. Eur Urol. 2022;82(4):399–410. https://doi.org/10.1016/j.eururo.2022.03.006.
Rosiello G, Larcher A, Montorsi F, Capitanio U. Renal cancer: overdiagnosis and overtreatment. World J Urol. 2021;39(8):2821–3. https://doi.org/10.1007/s00345-021-03798-z.
Sedaghat S. Early applications of ChatGPT in medical practice, education and research. Clin Med. 2023;23(3):278–9. https://doi.org/10.7861/clinmed.2023-0078.
Wu T, He S, Liu J, et al. A brief overview of ChatGPT: the history, status quo and potential future development. IEEE/CAA J Automatica Sinica. 2023;10(5):1122–36. https://doi.org/10.1109/JAS.2023.123618.
Ruksakulpiwat S, Kumar A, Ajibade A. Using ChatGPT in medical research: current status and future directions. J Multidiscip Healthc. 2023;16:1513–20. https://doi.org/10.2147/JMDH.S413470.
Capitanio U, Bensalah K, Bex A, et al. Epidemiology of renal cell carcinoma. Eur Urol. 2019;75(1):74–84. https://doi.org/10.1016/j.eururo.2018.08.036.
Campbell SC, Clark PE, Chang SS, Karam JA, Souter L, Uzzo RG. Renal mass and localized renal cancer: evaluation, management, and follow-up: AUA guideline: part I. J Urol. 2021;206(2):199–208. https://doi.org/10.1097/JU.0000000000001911.
Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198.
Kitamura FC. ChatGPT Is shaping the future of medical writing but still requires human judgment. Radiology. 2023;307(2):e230171. https://doi.org/10.1148/radiol.230171.
Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595. https://doi.org/10.3389/frai.2023.1169595.
Acknowledgment
The authors gratefully thank the editors and reviewers for their constructive suggestions to improve this manuscript.
Funding
Funding for this study was received from the National Natural Science Foundation of China (82102171); Shenzhen Science and Technology Innovation Commission (RCJC20200714114557005); the National Natural Science Foundation of China (Tianyuan Fund for Mathematics: 12326610) and the Shenzhen Medical Research Fund (SMRF: A2302048).
Author information
Authors and Affiliations
Contributions
Conceptualizing and designing the experiments: JH, SW and SZ. Analyzed the data: RL, AZ, LP and XX. Contributed reagents/materials/analysis: JZ, FW and FY. Wrote the manuscript: RL, AZ, SZ, LP and XX. All authors have read and approved the final manuscript.
Corresponding authors
Ethics declarations
Disclosure
Rui Liang, Anguo Zhao, Lei Peng, Xiaojian Xu, Jianye Zhong, Fan Wu, Fulin Yi, Shaohua Zhang, Song Wu, and Jianquan Hou declare they have no conflicts of interest in relation to this work.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Provenance and peer review
Not commissioned, externally peer reviewed.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 2 (MP4 73340 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, R., Zhao, A., Peng, L. et al. Enhanced Artificial Intelligence Strategies in Renal Oncology: Iterative Optimization and Comparative Analysis of GPT 3.5 Versus 4.0. Ann Surg Oncol (2024). https://doi.org/10.1245/s10434-024-15107-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1245/s10434-024-15107-0