Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms

Author:

Zyla Joanna12ORCID,Marczyk Michal13,Domaszewska Teresa2,Kaufmann Stefan H E2,Polanska Joanna1,Weiner January2ORCID

Affiliation:

1. Data Mining Group, Faculty of Automatic Control, Electronic and Computer Science, Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland

2. Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany

3. Yale School of Medicine, Yale Cancer Center, New Haven, CT 06510, USA

Abstract

Abstract Motivation Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. Results We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. Availability and implementation tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

Bill & Melinda Gates Foundation Grand Challenges in Global Health Program

BioVacSafe

Polish National Science Center

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference61 articles.

1. Comparative study of gene set enrichment methods;Abatangelo;BMC Bioinform,2009

2. Combining multiple tools outperforms individual methods in gene set enrichment analyses;Alhamdoosh;Bioinformatics,2017

3. Cancer is a preventable disease that requires major lifestyle changes;Anand;Pharm. Res,2008

4. Is there a reproducibility crisis? A nature survey lifts the lid on how researchers view the crisis rocking science and what they think will help;Baker;Nature,2016

5. A nonparametric test for the general two-sample problem;Baumgartner;Biometrics,1998

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3