Affiliation:
1. The Ohio State University
2. Freelance
Abstract
The tree edit distance (TED) has been found in a wide spectrum of applications in artificial intelligence, bioinformatics, and other areas, which serves as a metric to quantify the dissimilarity between two trees. As applications continue to scale in data size, with a growing demand for fast response time, TED has become even more increasingly data- and computing-intensive. Over the years, researchers have made dedicated efforts to improve sequential TED algorithms by reducing their high complexity. However, achieving efficient parallel TED computation in both algorithm and implementation is challenging due to its dynamic programming nature involving non-trivial issues of data dependency, runtime execution pattern changes, and optimal utilization of limited parallel resources.
Having comprehensively investigated the bottlenecks in the existing parallel TED algorithms, we develop a massive parallel computation framework for TED and its implementation on GPU, which is called X-TED. For a given TED computation, X-TED applies a fast preprocessing algorithm to identify dependency relationships among millions of dynamic programming tables. Subsequently, it adopts a dynamic parallel strategy to handle various processing stages, aiming to best utilize GPU cores and the limited device memory in an adaptive and automatic way. Our intensive experimental results demonstrate that X-TED surpasses all existing solutions, achieving up to 42x speedup over the state-of-the-art sequential AP-TED, and outperforming the existing multicore parallel MC-TED by an average speedup of 31x.
Publisher
Association for Computing Machinery (ACM)
Reference76 articles.
1. Wireframe
2. Tatsuya Akutsu. 2010. Tree edit distance problems: Algorithms and applications to bioinformatics. IEICE transactions on information and systems 93, 2 (2010), 208--218.
3. Alireza S. Abyaneh. 2020. Multi Core Tree Edit Distance. https://github.com/aabyaneh/MCTED Accessed on July 1, 2020.
4. The
pq
-gram distance between ordered labeled trees
5. A paradigm shift in GP-GPU computing