XML stream processing using tree-edit distance embeddings-Reference-Cited by-同舟云学术

XML stream processing using tree-edit distance embeddings

Published:2005-03 Issue:1 Volume:30 Page:279-332
ISSN:0362-5915
Container-title:ACM Transactions on Database Systems
language:en
Short-container-title:ACM Trans. Database Syst.

Author:

Garofalakis Minos¹,Kumar Amit²

Affiliation:

1. Bell Labs, Lucent Technologies, Murray Hill, NJ

2. Indian Institute of Technology, New Delhi, India

Abstract

We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L 1 vector space while guaranteeing a (worst-case) upper bound of O (log 2 n log* n ) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. Experimental results from an empirical study with both synthetic and real-life XML data trees validate our approach, demonstrating that the average-case behavior of our embedding techniques is much better than what would be predicted from our theoretical worst-case distortion bounds. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/1061318.1061326

Reference53 articles.

1. Join synopses for approximate query answering

2. Tracking join and self-join sizes in limited storage

3. The space complexity of approximating the frequency moments

Cited by 32 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Structure-Preserving Hashing for Tree-Structured Data;Signal, Image and Video Processing;2022-03-13

2. New and improved algorithms for unordered tree inclusion;Theoretical Computer Science;2021-09

3. Approximate consistency for transformations on words and trees;Theoretical Computer Science;2016-05

4. Tree edit distance: Robust and memory-efficient;Information Systems;2016-03

5. Similar Subtree Search Using Extended Tree Inclusion;IEEE Transactions on Knowledge and Data Engineering;2015-12-01