maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks
-
Published:2023-01-31
Issue:1
Volume:19
Page:e1010863
-
ISSN:1553-7358
-
Container-title:PLOS Computational Biology
-
language:en
-
Short-container-title:PLoS Comput Biol
Author:
Cazares Tareian A.ORCID,
Rizvi Faiz W.,
Iyer Balaji,
Chen Xiaoting,
Kotliar MichaelORCID,
Bejjani Anthony T.,
Wayman Joseph A.,
Donmez Omer,
Wronowski Benjamin,
Parameswaran Sreeja,
Kottyan Leah C.,
Barski Artem,
Weirauch Matthew T.,
Prasath V. B. Surya,
Miraldi Emily R.ORCID
Abstract
Transcription factors read the genome, fundamentally connecting DNA sequence to gene expression across diverse cell types. Determining how, where, and when TFs bind chromatin will advance our understanding of gene regulatory networks and cellular behavior. The 2017 ENCODE-DREAM in vivo Transcription-Factor Binding Site (TFBS) Prediction Challenge highlighted the value of chromatin accessibility data to TFBS prediction, establishing state-of-the-art methods for TFBS prediction from DNase-seq. However, the more recent Assay-for-Transposase-Accessible-Chromatin (ATAC)-seq has surpassed DNase-seq as the most widely-used chromatin accessibility profiling method. Furthermore, ATAC-seq is the only such technique available at single-cell resolution from standard commercial platforms. While ATAC-seq datasets grow exponentially, suboptimal motif scanning is unfortunately the most common method for TFBS prediction from ATAC-seq. To enable community access to state-of-the-art TFBS prediction from ATAC-seq, we (1) curated an extensive benchmark dataset (127 TFs) for ATAC-seq model training and (2) built “maxATAC”, a suite of user-friendly, deep neural network models for genome-wide TFBS prediction from ATAC-seq in any cell type. With models available for 127 human TFs, maxATAC is the largest collection of high-performance TFBS prediction models for ATAC-seq. maxATAC performance extends to primary cells and single-cell ATAC-seq, enabling improved TFBS prediction in vivo. We demonstrate maxATAC’s capabilities by identifying TFBS associated with allele-dependent chromatin accessibility at atopic dermatitis genetic risk loci.
Funder
National Institute of Allergy and Infectious Diseases
National Human Genome Research Institute
National Institute of Neurological Disorders and Stroke
National Institute of General Medical Sciences
National Institute of Arthritis and Musculoskeletal and Skin Diseases
National Institute of Diabetes and Digestive and Kidney Diseases
Cincinnati Children’s Research Foundation
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference100 articles.
1. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits;LA Hindorff;Proceedings of the National Academy of Sciences,2009
2. Systematic localization of common disease-associated variation in regulatory DNA;MT Maurano;Science (1979).,2012
3. Genetic and epigenetic fine mapping of causal autoimmune disease variants;KK-H Farh;Nature,2015
4. Transcription factors operate across disease loci, with EBNA2 implicated in autoimmunity;JB Harley;Nat Genet,2018
5. Emerging properties of animal gene regulatory networks;EH Davidson;Nature,2010
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献