Affiliation:
1. Charles University Prague, Prague, Czech Republic
2. University of Manchester, Manchester, United Kingdom
Abstract
Fixed-radius near neighbor search is a fundamental data operation that retrieves all data points within a user-specified distance to a query point. There are efficient algorithms that can provide fast approximate query responses, but they often have a very compute-intensive indexing phase and require careful parameter tuning. Therefore, exact brute force and tree-based search methods are still widely used. Here we propose a new fixed-radius near neighbor search method, called SNN, that significantly improves over brute force and tree-based methods in terms of index and query time, provably returns exact results, and requires no parameter tuning. SNN exploits a sorting of the data points by their first principal component to prune the query search space. Further speedup is gained from an efficient implementation using high-level basic linear algebra subprograms (BLAS). We provide theoretical analysis of our method and demonstrate its practical performance when used stand-alone and when applied within the DBSCAN clustering algorithm.
Funder
Royal Society Industry Fellowship
Reference64 articles.
1. Refining a k-nearest neighbor graph for a computationally efficient spectral clustering;Alshammari;Pattern Recognition,2021
2. ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms;Aumüller;Information Systems,2020
3. Speeding up the Xbox recommender system using a Euclidean transformation for inner-product spaces;Bachrach,2014
4. LSH Forest: self-tuning indexes for similarity search;Bawa,2005
5. Multidimensional binary search trees used for associative searching;Bentley;Communications of the ACM,1975a