Examining the Efficacy of ChatGPT in Marking Short-Answer Assessments in an Undergraduate Medical Program-Reference-Cited by-同舟云学术

Examining the Efficacy of ChatGPT in Marking Short-Answer Assessments in an Undergraduate Medical Program

Published:2024-01-19 Issue:1 Volume:3 Page:32-43
ISSN:2813-141X
Container-title:International Medical Education
language:en
Short-container-title:IME

Author:

Morjaria Leo¹,Burns Levi¹^ORCID,Bracken Keyna¹²^ORCID,Levinson Anthony J.¹^ORCID,Ngo Quang N.¹²,Lee Mark²,Sibbald Matthew¹²

Affiliation:

1. Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON L8P 1H6, Canada

2. McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University, Hamilton, ON L8P 1H6, Canada

Abstract

Traditional approaches to marking short-answer questions face limitations in timeliness, scalability, inter-rater reliability, and faculty time costs. Harnessing generative artificial intelligence (AI) to address some of these shortcomings is attractive. This study aims to validate the use of ChatGPT for evaluating short-answer assessments in an undergraduate medical program. Ten questions from the pre-clerkship medical curriculum were randomly chosen, and for each, six previously marked student answers were collected. These sixty answers were evaluated by ChatGPT in July 2023 under four conditions: with both a rubric and standard, with only a standard, with only a rubric, and with neither. ChatGPT displayed good Spearman correlations with a single human assessor (r = 0.6–0.7, p < 0.001) across all conditions, with the absence of a standard or rubric yielding the best correlation. Scoring differences were common (65–80%), but score adjustments of more than one point were less frequent (20–38%). Notably, the absence of a rubric resulted in systematically higher scores (p < 0.001, partial η2 = 0.33). Our findings demonstrate that ChatGPT is a viable, though imperfect, assistant to human assessment, performing comparably to a single expert assessor. This study serves as a foundation for future research on AI-based assessment techniques with potential for further optimization and increased reliability.

Publisher

MDPI AG

Link

https://www.mdpi.com/2813-141X/3/1/4/pdf

Reference35 articles.

1. Reimagining Medical Education in the Age of AI;Wartman;AMA J. Ethics,2019

2. Artificial intelligence in medical education;Masters;Med. Teach.,2019

3. ChatGPT—Reshaping medical education and clinical management;Khan;Pak. J. Med. Sci.,2023

4. Lee, H. (2023). The rise of ChatGPT: Exploring its potential in medical education. Anat. Sci. Educ., ase.2270.

5. Sallam, M. (2023). ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare, 11.