Efficient Top-k Frequent Itemset Mining on Massive Data-Reference-Cited by-同舟云学术

Efficient Top-k Frequent Itemset Mining on Massive Data

Published:2024-02-06 Issue:2 Volume:9 Page:177-203
ISSN:2364-1185
Container-title:Data Science and Engineering
language:en
Short-container-title:Data Sci. Eng.

Author:

Wan Xiaolong,Han Xixian^ORCID

Abstract

AbstractTop-k frequent itemset mining (top-k FIM) plays an important role in many practical applications. It reports the k itemsets with the highest supports. Rather than the subtle minimum support threshold specified in FIM, top-k FIM only needs the more understandable parameter of the result number. The existing algorithms require at least two passes of scan on the table, and incur high execution cost on massive data. This paper develops a prefix-partitioning-based PTF algorithm to mine top-k frequent itemsets efficiently, where each prefix-based partition keeps the transactions sharing the same prefix item. PTF can skip most of the partitions directly which cannot generate any top-k frequent itemsets. Vertical mining is developed to process the partitions of vertical representation with the high-support-first principle, and only a small fraction of the items are involved in the processing of the partitions. Two improvements are proposed to reduce execution cost further. Hybrid vertical storage mode maintains the prefix-based partitions adaptively and the candidate pruning reduces the number of the explored candidates. The extensive experimental results show that, on massive data, PTF can achieve up to 1348.53 times speedup ratio and involve up to 355.31 times less I/O cost compared with the state-of-the-art algorithms.

Funder

National Natural Science Foundation of China

Taishan Scholars Program of Shandong Province

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s41019-024-00241-2.pdf

Reference58 articles.

1. Abdelaal AA, Abed S, Alshayeji M, Al-laho M (2021) Customized frequent patterns mining algorithms for enhanced top-rank-k frequent pattern mining. Expert Syst Appl 169:114530

2. Aggarwal CC (2015) Data mining—the textbook. Springer, Berlin

3. Aggarwal CC, Han J (eds) (2014) Frequent pattern mining. Springer, Berlin

4. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp 207–216

5. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th international conference on very large data bases, pp 487–499