Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model


Selvakumar Pavitra12ORCID,Siddharthan Rahul12ORCID


1. The Institute of Mathematical Sciences, Chennai, India

2. Homi Bhabha National Institute, Mumbai, India


Transcription factor binding sites (TFBS), like other DNA sequence, evolve via mutation and selection relating to their function. Models of nucleotide evolution describe DNA evolution via single-nucleotide mutation. A stationary vector of such a model is the long-term distribution of nucleotides, unchanging under the model. Neutrally evolving sites may have uniform stationary vectors, but one expects that sites within a TFBS instead have stationary vectors reflective of the fitness of various nucleotides at those positions. We introduce ‘position-specific stationary vectors’ (PSSVs), the collection of stationary vectors at each site in a TFBS locus, analogous to the position weight matrix (PWM) commonly used to describe TFBS. We infer PSSVs for human TFs using two evolutionary models (Felsenstein 1981 and Hasegawa-Kishino-Yano 1985). We find that PSSVs reflect the nucleotide distribution from PWMs, but with reduced specificity. We infer ancestral nucleotide distributions at individual positions and calculate ‘conditional PSSVs’ conditioned on specific choices of majority ancestral nucleotide. We find that certain ancestral nucleotides exert a strong evolutionary pressure on neighbouring sequence while others have a negligible effect. Finally, we present a fast likelihood calculation for the F81 model on moderate-sized trees that makes this approach feasible for large-scale studies along these lines.


Department of Atomic Energy, Government of India


The Royal Society



Reference38 articles.

1. Identifying protein-binding sites from unaligned DNA fragments.

2. A Feature-Based Approach to Modeling Protein–DNA Interactions

3. Dinucleotide Weight Matrices for Predicting Transcription Factor Binding Sites: Generalizing the Position Weight Matrix

4. Kulakovskiy IV Levitsky VG Oschepkov DG Vorontsov IE Makeev VJ. 2013 Learning advanced TFBS models from chip-seq data-diChIPMunk: effective construction of dinucleotide positional weight matrices. In Int. Conf. on Bioinformatics Models Methods and Algorithms vol. 2 pp. 146–150. Setúbal Portugal: SciTePress Science and Technology Publications.

5. Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors








Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3