A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets-Reference-Cited by-同舟云学术

A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets

Published:2006-05-02 Issue:1 Volume:7 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Lai Carmen,Reinders Marcel JT,van't Veer Laura J,Wessels Lodewyk FA

Abstract

Abstract Background Gene selection is an important step when building predictors of disease state based on gene expression data. Gene selection generally improves performance and identifies a relevant subset of genes. Many univariate and multivariate gene selection approaches have been proposed. Frequently the claim is made that genes are co-regulated (due to pathway dependencies) and that multivariate approaches are therefore per definition more desirable than univariate selection approaches. Based on the published performances of all these approaches a fair comparison of the available results can not be made. This mainly stems from two factors. First, the results are often biased, since the validation set is in one way or another involved in training the predictor, resulting in optimistically biased performance estimates. Second, the published results are often based on a small number of relatively simple datasets. Consequently no generally applicable conclusions can be drawn. Results In this study we adopted an unbiased protocol to perform a fair comparison of frequently used multivariate and univariate gene selection techniques, in combination with a ränge of classifiers. Our conclusions are based on seven gene expression datasets, across several cancer types. Conclusion Our experiments illustrate that, contrary to several previous studies, in five of the seven datasets univariate selection approaches yield consistently better results than multivariate approaches. The simplest multivariate selection approach, the Top Scoring method, achieves the best results on the remaining two datasets. We conclude that the correlation structures, if present, are difficult to extract due to the small number of samples, and that consequently, overly-complex gene selection algorithms that attempt to extract these structures are prone to overtraining.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-7-235.pdf

Reference45 articles.

1. Kohavi G Rand John: Wrappers for Feature Subset Selection. Artificial Intelligence 1997, 97: 273–324.

2. Tssamardinos C land Aliferis: Towards Principled Feature Selection: Relevancy, Filters and Wrappers. Ninth International Workshop on Artificial Intelligence and Statistics 2003.

3. Ein-Dor L, Kela I, Getz G, Givol D, Domany E: Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 2004., (12):

4. Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z: Tissue classification with gene expression profiles. In Proceedings of the fourth annual international Conference on Computational molecular biology. Tokyo, Japan: ACM Press; 2000:54–64.

5. Blanco R, Larranaga P, Inza I, Sierra B: Gene selection for cancer classification using wrapper approaches. International Journal of Pattern Recognition and Artificial Intelligence 2004, 18(8):1373–1390.

Cited by 92 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Feature Selection Optimization for COVID-19 Microarray Data;COVID;2023-09-04

2. Small patient datasets reveal genetic drivers of non-small cell lung cancer subtypes using machine learning for hypothesis generation;Exploration of Medicine;2023-07-26

3. A statistical perspective of gene set analysis with trait-specific QTL in molecular crop breeding;QTL Mapping in Crop Improvement;2023

4. A machine learning application in wine quality prediction;Machine Learning with Applications;2022-06

5. Machine learning model for classification of predominantly allergic and non-allergic asthma among preschool children with asthma hospitalization;Journal of Asthma;2022-04-07