CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models-Reference-Cited by-同舟云学术

CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models

Published:2024-03-12 Issue:1 Volume:2 Page:1-28
ISSN:2836-6573
Container-title:Proceedings of the ACM on Management of Data
language:en
Short-container-title:Proc. ACM Manag. Data

Author:

Zhang Hailin¹^ORCID,Liu Zirui¹^ORCID,Chen Boxuan¹^ORCID,Zhao Yikai¹^ORCID,Zhao Tong¹^ORCID,Yang Tong¹^ORCID,Cui Bin¹^ORCID

Affiliation:

1. Peking University, Beijing, China

Abstract

Recently, the growing memory demands of embedding tables in Deep Learning Recommendation Models (DLRMs) pose great challenges for model training and deployment. Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embedding compression framework that addresses the above requirements. The design philosophy of CAFE is to dynamically allocate more memory resources to important features (called hot features), and allocate less memory to unimportant ones. In CAFE, we propose a fast and lightweight sketch data structure, named HotSketch, to capture feature importance and report hot features in real time. For each reported hot feature, we assign it a unique embedding. For the non-hot features, we allow multiple features to share one embedding by using hash embedding technique. Guided by our design philosophy, we further propose a multi-level hash embedding framework to optimize the embedding tables of non-hot features. We theoretically analyze the accuracy of HotSketch, and analyze the model convergence against deviation. Extensive experiments show that CAFE significantly outperforms existing embedding compression methods, yielding 3.92% and 3.68% superior testing AUC on Criteo Kaggle dataset and CriteoTB dataset at a compression ratio of 10000x. The source codes of CAFE are available at GitHub.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3639306

Reference84 articles.

1. Aden and Yi Wang. 2012. KDD Cup 2012 Track 2. https://kaggle.com/competitions/kddcup2012-track2.

2. Zeyuan Allen-Zhu. 2017. Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter. In Proceedings of the 34th International Conference on Machine Learning (ICML).

3. Moses Charikar, Kevin C. Chen, and Martin Farach-Colton. 2002. Finding Frequent Items in Data Streams. In Automata, Languages and Programming, 29th International Colloquium (ICALP).

4. LOGER: A Learned Optimizer Towards Generating Efficient and Robust Query Execution Plans

5. Wide & Deep Learning for Recommender Systems

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. WavingSketch: an unbiased and generic sketch for finding top-k items in data streams;The VLDB Journal;2024-07-29