Diversity in Renal Mass Data Cohorts: Implications for Urology AI Researchers

Author:

Cen Harmony Selena,Dandamudi Siddhartha,Lei Xiaomeng,Weight Chris,Desai Mihir,Gill Inderbir,Duddalwar Vinay

Abstract

<b><i>Introduction:</i></b> We examine the heterogeneity and distribution of the cohort populations in two publicly used radiological image cohorts, the Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma (TCIA TCGA KIRC) collection and 2019 MICCAI Kidney Tumor Segmentation Challenge (KiTS19), and deviations in real-world population renal cancer data from the National Cancer Database (NCDB) Participant User Data File (PUF) and tertiary center data. PUF data are used as an anchor for prevalence rate bias assessment. Specific gene expression and, therefore, biology of RCC differ by self-reported race, especially between the African American and Caucasian populations. AI algorithms learn from datasets, but if the dataset misrepresents the population, reinforcing bias may occur. Ignoring these demographic features may lead to inaccurate downstream effects, thereby limiting the translation of these analyses to clinical practice. Consciousness of model training biases is vital to patient care decisions when using models in clinical settings. <b><i>Methods:</i></b> Data elements evaluated included gender, demographics, reported pathologic grading, and cancer staging. American Urological Association risk levels were used. Poisson regression was performed to estimate the population-based and sample-specific estimation for prevalence rate and corresponding 95% confidence interval. SAS 9.4 was used for data analysis. <b><i>Results:</i></b> Compared to PUF, KiTS19 and TCGA KIRC oversampled Caucasian by 9.5% (95% CI, −3.7 to 22.7%) and 15.1% (95% CI, 1.5 to 28.8%), undersampled African American by −6.7% (95% CI, −10% to −3.3%), and −5.5% (95% CI, −9.3% to −1.8%). Tertiary also undersampled African American by −6.6% (95% CI, −8.7% to −4.6%). The tertiary cohort largely undersampled aggressive cancers by −14.7% (95% CI, −20.9% to −8.4%). No statistically significant difference was found among PUF, TCGA, and KiTS19 in aggressive rate; however, heterogeneities in risk are notable. <b><i>Conclusion:</i></b> Heterogeneities between cohorts need to be considered in future AI training and cross-validation for renal masses.

Publisher

S. Karger AG

Subject

Cancer Research,Oncology,General Medicine

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3