Metrics Used for the Evaluation of Chatbots Providing Cancer Genetic Risk Assessment and Education: A Systematic Review (Preprint)

Author:

Scalia Jennifer LORCID,Laprise Jessica LORCID,Thrift Jason RORCID,Farrell Christopher LORCID,Sarasua Sara MORCID

Abstract

BACKGROUND

Chatbots have recently emerged as an alternative approach for delivering cancer risk assessment and genetic counseling. Understanding the metrics used to describe the user-chatbot experience highlights the strengths and weaknesses of AI-assisted healthcare applications, ensuring safe and reliable medical care. While research supports chatbots in cancer genetic risk assessment and counseling, the evaluation measures remain inconsistent and unsystematic.

OBJECTIVE

This systematic review analyzes the metrics used to evaluate chatbot platforms providing cancer genetic risk assessment and pre-test and post-test genetic education. We examine these measures to identify potential limitations and inform a more systematic evaluative approach.

METHODS

A comprehensive search was conducted using three databases: PubMed, Web of Science, and Engineering Village. Articles were screened and analyzed using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. Study and chatbot characteristics were documented, along with variables affecting metric use. Metrics evaluating the user-chatbot experience were extracted, categorized into domains, and organized within the Reach, Effectiveness, Adoption, Implementation, and Maintenance (RE-AIM) framework to identify assessment gaps and insights regarding application and effectiveness.

RESULTS

This database search retrieved 684 citations, with 11 articles meeting the inclusion criteria. The studies varied in methodologies, research settings, chatbot functionalities, and participants' characteristics. A total of 104 measures were extracted and categorized into 16 groups, with each study utilizing 2 to 22 metrics (median of 8). The measurement groups were organized into five domains: user experience, knowledge acquisition, outcomes and behaviors, emotional response, and technical performance, with user experience measures being the most common. Notably, despite the educational purpose of AI-assisted genetic counseling, the knowledge acquisition domain ranked third in metric usage. The RE-AIM framework illustrated that the study metrics addressed its five dimensions, highlighting user-centric measures omitted from chatbot evaluations, which included accuracy, transparency, data privacy, and educational continuity.

CONCLUSIONS

The limited studies on automated cancer genetic risk assessment and education showed significant variability in the metrics used. A unified evaluation process is essential for accurately assessing chatbot effectiveness. The measurements of knowledge that users gain hold important value, yet they are currently only moderately significant. Expanding educational metrics will strengthen the informed consent process and empower patients in their healthcare decisions. Additionally, recognizing confounding variables and utilizing frameworks such as RE-AIM can help ensure that appropriate measures are properly implemented and not overlooked. These strategies will ultimately promote the safe and effective use of novel genetic services.

CLINICALTRIAL

N/A

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.7亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2025 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3