A validity study of COMLEX-USA Level 3 with the new test design

Author:

Mao Xia1,Boulet John R.1,Sandella Jeanne M.1,Oliverio Michael F.2,Smith Larissa1

Affiliation:

1. 159673 National Board of Osteopathic Medical Examiners , Chicago , IL , USA

2. Adjunct Clinical Faculty, Departments of Family Practice and OMM , NYIT-COM , North Bellmore , NY , USA

Abstract

Abstract Context The National Board of Osteopathic Medical Examiners (NBOME) administers the Comprehensive Osteopathic Medical Licensing Examination of the United States (COMLEX-USA), a three-level examination designed for licensure for the practice of osteopathic medicine. The examination design for COMLEX-USA Level 3 (L3) was changed in September 2018 to a two-day computer-based examination with two components: a multiple-choice question (MCQ) component with single best answer and a clinical decision-making (CDM) case component with extended multiple-choice (EMC) and short answer (SA) questions. Continued validation of the L3 examination, especially with the new design, is essential for the appropriate interpretation and use of the test scores. Objectives The purpose of this study is to gather evidence to support the validity of the L3 examination scores under the new design utilizing sources of evidence based on Kane’s validity framework. Methods Kane’s validity framework contains four components of evidence to support the validity argument: Scoring, Generalization, Extrapolation, and Implication/Decision. In this study, we gathered data from various sources and conducted analyses to provide evidence that the L3 examination is validly measuring what it is supposed to measure. These include reviewing content coverage of the L3 examination, documenting scoring and reporting processes, estimating the reliability and decision accuracy/consistency of the scores, quantifying associations between the scores from the MCQ and CDM components and between scores from different competency domains of the L3 examination, exploring the relationships between L3 scores and scores from a performance-based assessment that measures related constructs, performing subgroup comparisons, and describing and justifying the criterion-referenced standard setting process. The analysis data contains first-attempt test scores for 8,366 candidates who took the L3 examination between September 2018 and December 2019. The performance-based assessment utilized as a criterion measure in this study is COMLEX-USA Level 2 Performance Evaluation (L2-PE). Results All assessment forms were built through the automated test assembly (ATA) procedure to maximize parallelism in terms of content coverage and statistical properties across the forms. Scoring and reporting follows industry-standard quality-control procedures. The inter-rater reliability of SA rating, decision accuracy, and decision consistency for pass/fail classifications are all very high. There is a statistically significant positive association between the MCQ and the CDM components of the L3 examination. The patterns of associations, both within the L3 subscores and with L2-PE domain scores, fit with what is being measured. The subgroup comparisons by gender, race, and first language showed expected small differences in mean scores between the subgroups within each category and yielded findings that are consistent with those described in the literature. The L3 pass/fail standard was established through implementation of a defensible criterion-referenced procedure. Conclusions This study provides some additional validity evidence for the L3 examination based on Kane’s validity framework. The validity of any measurement must be established through ongoing evaluation of the related evidence. The NBOME will continue to collect evidence to support validity arguments for the COMLEX-USA examination series.

Publisher

Walter de Gruyter GmbH

Reference38 articles.

1. National Board of Osteopathic Medical Examiners. COMLEX-USA level 3 — NBOME. https://www.nbome.org/assessments/comlex-usa/comlex-usa-level-3/ [Accessed 20 Nov 2022].

2. American Educational Research Association. Standards for educational and psychological testing; 2014. http://www.aera.net/Publications/Books/Standards-for-Educational-Psychological-Testing-2014-Edition [Accessed 6 July 2018].

3. Messick, S. Meaning and values in test validation: the science and ethics of assessment. Educ Res 1989;18:5–11. https://doi.org/10.3102/0013189X018002005.

4. Kane, MT. The argument-based approach to validation. Sch Psychol Rev 2013;42:448–57. https://doi.org/10.1080/02796015.2013.12087465.

5. Kane, MT. Current concept in validity theory. J Educ Meas 2001;38:319–42.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3