Abstract
AbstractMotivationUnderstanding how changes in non-coding DNAs regulate gene expression remains a formidable challenge with profound implications for advancing human genetics and disease research. Accurate prediction of the up- and down-regulation of gene expression quantitative trait loci (eQTL) can offer a potential solution to expedite the identification of non-coding variants associations with phenotypes. However, despite existing methods for predicting the impact of non-coding mutations on changes in gene expression, the current SOTA tool ‘Enformer’ still cannot accurately predict the sign of eQTLs. Moreover, the constraints of tissue specificity necessitate the utilization of distinct training models for each particular tissue type within existing methods. This hinders the extension of predictive capacities to the level of single-cell resolution.ResultsIn this work, we introduce a novel transformer-based pretrained method, called EMO, to predict the up- and down-regulation of gene expression driven by single non-coding mutations from DNA sequences and ATAC-seq data. It extended the effective prediction range to 1Mbp between the non-coding mutation and the transcription start site (TSS) of the affected gene, with competitive prediction performance across various sequence lengths, outperforming the retrained Enformer structures. We fine-tuned EMO on the eQTLs of two brain tissues to evaluate its robustness through external validation. We also evaluated the transfer ability of EMO into the single-cell resolution by fine-tuning it on six types of immune single-cell eQTL, achieving satisfactory performance in all cell types (AUC > 0.860). EMO also showed its potential in handling disease-associated eQTLs.Availability and implementationThe source code is freely available athttps://github.com/Liuzhe30/EMO.
Publisher
Cold Spring Harbor Laboratory