Metric space similarity joins-Reference-Cited by-同舟云学术

Metric space similarity joins

Published:2008-06 Issue:2 Volume:33 Page:1-38
ISSN:0362-5915
Container-title:ACM Transactions on Database Systems
language:en
Short-container-title:ACM Trans. Database Syst.

Author:

Jacox Edwin H.¹,Samet Hanan¹

Affiliation:

1. University of Maryland, College Park, MD

Abstract

Similarity join algorithms find pairs of objects that lie within a certain distance ϵ of each other. Algorithms that are adapted from spatial join techniques are designed primarily for data in a vector space and often employ some form of a multidimensional index. For these algorithms, when the data lies in a metric space, the usual solution is to embed the data in vector space and then make use of a multidimensional index. Such an approach has a number of drawbacks when the data is high dimensional as we must eventually find the most discriminating dimensions, which is not trivial. In addition, although the maximum distance between objects increases with dimension, the ability to discriminate between objects in each dimension does not. These drawbacks are overcome via the introduction of a new method called Quickjoin that does not require a multidimensional index and instead adapts techniques used in distance-based indexing for use in a method that is conceptually similar to the Quicksort algorithm. A formal analysis is provided of the Quickjoin method. Experiments show that the Quickjoin method significantly outperforms two existing techniques.

Funder

Division of Computing and Communication Foundations

National Science Foundation

Division of Information and Intelligent Systems

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/1366102.1366104

Reference69 articles.

1. Towards systematic design of distance functions for data mining applications

2. Hashing by proximity to process duplicates in spatial databases

Cited by 84 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MetricJoin: Leveraging Metric Properties for Robust Exact Set Similarity Joins;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04

2. Diversity Similarity Join for Big Data;Similarity Search and Applications;2023

3. Adding Result Diversification to $$k$$NN-Based Joins in a Map-Reduce Framework;Lecture Notes in Computer Science;2023

4. Assessing the Existence of a Function in a Dataset with the g3 Indicator;2022 IEEE 38th International Conference on Data Engineering (ICDE);2022-05

5. Parallelizing filter-and-verification based exact set similarity joins on multicores;Information Systems;2021-10