Radiomics feature analysis and model research for predicting histopathological subtypes of non‐small cell lung cancer on CT images: A multi‐dataset study

Author:

Song Fan1,Song Xiao2,Feng Youdan1,Fan Guangda1,Sun Yangyang1,Zhang Peng1,Li Jinkai3,Liu Fei4,Zhang Guanglei1

Affiliation:

1. Beijing Advanced Innovation Center for Biomedical Engineering School of Biological Science and Medical Engineering Beihang University Beijing China

2. School of Medical Imaging Shanxi Medical University Taiyuan China

3. School of General Engineering Beihang University Beijing China

4. Beijing Advanced Information & Industrial Technology Research Institute Beijing Information Science & Technology University Beijing China

Abstract

AbstractPurposeClassifying the subtypes of non‐small cell lung cancer (NSCLC) is essential for clinically adopting optimal treatment strategies and improving clinical outcomes, but the histological subtypes are confirmed by invasive biopsy or post‐operative examination at present. Based on multi‐center data, this study aimed to analyze the importance of extracted CT radiomics features and develop the model with good generalization performance for precisely distinguishing major NSCLC subtypes: adenocarcinoma (ADC) and squamous cell carcinoma (SCC).MethodsWe collected a multi‐center CT dataset with 868 patients from eight international databases on the cancer imaging archive (TCIA). Among them, patients from five databases were mixed and split to training and test sets (560:140). The remaining three databases were used as independent test sets: TCGA set (n = 97) and lung3 set (n = 71). A total of 1409 features containing shape, intensity, and texture information were extracted from tumor volume of interest (VOI), then the ℓ2,1‐norm minimization was used for feature selection and the importance of selected features was analyzed. Next, the prediction and generalization performance of 130 radiomics models (10 common algorithms and 120 heterogeneous ensemble combinations) were compared by the average AUC value on three test sets. Finally, predictive results of the optimal model were shown.ResultsAfter feature selection, 401 features were obtained. Features of intensity, texture GLCM, GLRLM, and GLSZM had higher classification weight coefficients than other features (shape, texture GLDM, and NGTDM), and the filtered image features exhibited significant importance than original image features (p‐value = 0.0210). Moreover, five ensemble learning algorithms (Bagging, AdaBoost, RF, XGBoost, GBDT) had better generalization performance (p‐value = 0.00418) than other non‐ensemble algorithms (MLP, LR, GNB, SVM, KNN). The Bagging‐AdaBoost‐SVM model had the highest AUC value (0.815 ± 0.010) on three test sets. It obtained AUC values of 0.819, 0.823, and 0.804 on test set, TCGA set and lung3 set, respectively.ConclusionOur multi‐dataset study showed that intensity features, texture features (GLCM, GLRLM, and GLSZM) and filtered image features were more important for distinguishing ADCs from SCCs. The method of ensemble learning can improve the prediction and generalization performance on the complicated multi‐center data. The Bagging‐AdaBoost‐SVM model had the strongest generalization performance, and it showed promising clinical value for non‐invasively predicting the histopathological subtypes of NSCLC.

Funder

Natural Science Foundation of Beijing Municipality

National Natural Science Foundation of China

Higher Education Discipline Innovation Project

Publisher

Wiley

Subject

General Medicine

全球学者库

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"全球学者库"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前全球学者库共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2023 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3