Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method-Reference-Cited by-同舟云学术

Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method

Published:2023-09-07 Issue:1 Volume:24 Page:
ISSN:1474-760X
Container-title:Genome Biology
language:en
Short-container-title:Genome Biol

Author:

Yu Ying,Zhang Naixin,Mai Yuanbang,Ren Luyao,Chen Qiaochu,Cao Zehui,Chen Qingwang,Liu Yaqing,Hou Wanwan,Yang Jingcheng,Hong Huixiao,Xu Joshua,Tong Weida,Dong Lianhua,Shi Leming,Fang Xiang,Zheng Yuanting^ORCID

Abstract

Abstract Background Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios. Results As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies. Conclusions Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.

Funder

Key Research and Development Project of Hainan Province

National Mega Project on Major Infectious Disease Prevention

National Natural Science Foundation of China

State Key Laboratory of Genetic Engineering

111 Project

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s13059-023-03047-z.pdf

Reference62 articles.

1. Su Z, Łabaj PP, Li S, Thierry-Mieg J, Thierry-Mieg D, Shi W, Wang C, Schroth GP, Setterquist RA, Thompson JF, et al. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol. 2014;32:903–14.