Author:
Arratia Richard,Morris Pricilla,Waterman Michael S.
Abstract
A derivation of a law of large numbers for the highest-scoring matching subsequence is given. Let Xk, Yk be i.i.d. q=(q(i))i∊S letters from a finite alphabet S and v=(v(i))i∊S be a sequence of non-negative real numbers assigned to the letters of S. Using a scoring system similar to that of the game Scrabble, the score of a word w=i1 · ·· im is defined to be V(w)=v(i1) + · ·· + v(im). Let Vn denote the value of the highest-scoring matching contiguous subsequence between X1X2 · ·· Xn and Y1Y2· ·· Yn. In this paper, we show that Vn/K log(n) → 1 a.s. where K ≡ K(q,v). The method employed here involves ‘stuttering’ the letters to construct a Markov chain and applying previous results for the length of the longest matching subsequence. An explicit form for β ∊Pr(S), where β (i) denotes the proportion of letter i found in the highest-scoring word, is given. A similar treatment for Markov chains is also included.Implicit in these results is a large-deviation result for the additive functional, H ≡ Σn < τv(Xn), for a Markov chain stopped at the hitting time τ of some state. We give this large deviation result explicitly, for Markov chains in discrete time and in continuous time.
Publisher
Cambridge University Press (CUP)
Subject
Statistics, Probability and Uncertainty,General Mathematics,Statistics and Probability
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Pr[m]: An Algorithm for Protein Motif Discovery;IEEE/ACM Transactions on Computational Biology and Bioinformatics;2022-01-01
2. Matching strings in encoded sequences;Bernoulli;2020-08-01
3. MSuPDA: A Memory Efficient Algorithm for Sequence Alignment;Interdisciplinary Sciences: Computational Life Sciences;2015-08-08
4. An integrated algorithm for local sequence alignment;Network Modeling Analysis in Health Informatics and Bioinformatics;2014-08-19
5. Sequence Alignment as Hypothesis Testing;Journal of Computational Biology;2011-05