Affiliation:
1. Università di Pisa, Pisa, Italy
2. Università del Piemonte Orientale, Alessandria, Italy
Abstract
We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the
occ
occurrences of a pattern
P
[1,
p
] within a text
T
[1,
n
] in
O
(
p
+
occ
log
1+ε
n
) time for any chosen ε, 0<ε<1. This data structure uses at most 5
n
H
k
(
T
) +
o
(
n
) bits of storage, where
H
k
(
T
) is the
k
th order empirical entropy of
T
. The space usage is Θ(
n
) bits in the worst case and
o
(
n
) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows--Wheeler Transform, and can be regarded as a
compressed suffix array
.Our second compressed data structure achieves
O
(
p
+
occ
) query time using
O
(
n
H
k
(
T
)log
ε
n
) +
o
(
n
) bits of storage for any chosen ε, 0<ε<1. Therefore, it provides optimal
output-sensitive
query time using
o
(
n
log
n
) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows--Wheeler Transform and the
LZ78
algorithm.
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software
Cited by
453 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献