Decoupled Progressive Distillation for Sequential Prediction with Interaction Dynamics-Reference-Cited by-同舟云学术

Decoupled Progressive Distillation for Sequential Prediction with Interaction Dynamics

Published:2023-12-29 Issue:3 Volume:42 Page:1-35
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Hu Kaixi¹^ORCID,Li Lin²^ORCID,Xie Qing²^ORCID,Liu Jianquan³^ORCID,Tao Xiaohui⁴^ORCID,Xu Guandong⁵^ORCID

Affiliation:

1. School of Computer Science, University of Technology Sydney, Australia and School of Computer Science and Artificial Intelligence, Wuhan University of Technology, China

2. School of Computer Science and Artificial Intelligence, Wuhan University of Technology, China

3. Visual Intelligence Research Laboratories, NEC Corporation, Japan

4. School of Mathematics, Physics and Computing, University of Southern Queensland, Australia

5. School of Computer Science, University of Technology Sydney, Australia

Abstract

Sequential prediction has great value for resource allocation due to its capability in analyzing intents for next prediction. A fundamental challenge arises from real-world interaction dynamics where similar sequences involving multiple intents may exhibit different next items. More importantly, the character of volume candidate items in sequential prediction may amplify such dynamics, making deep networks hard to capture comprehensive intents. This article presents a sequential prediction framework with Decoupled Progressive Distillation (DePoD), drawing on the progressive nature of human cognition. We redefine target and non-target item distillation according to their different effects in the decoupled formulation. This can be achieved through two aspects: (1) Regarding how to learn, our target item distillation with progressive difficulty increases the contribution of low-confidence samples in the later training phase while keeping high-confidence samples in the earlier phase. And, the non-target item distillation starts from a small subset of non-target items from which size increases according to the item frequency. (2) Regarding whom to learn from, a difference evaluator is utilized to progressively select an expert that provides informative knowledge among items from the cohort of peers. Extensive experiments on four public datasets show DePoD outperforms state-of-the-art methods in terms of accuracy-based metrics.

Funder

National Natural Science Foundation of China

Key Research and Development Program of Hubei Province

University of Technology Sydney

China Scholarship Council

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3632403

Reference83 articles.

1. Deliberate Practice and Acquisition of Expert Performance: A General Overview

2. Curriculum learning

3. Online Knowledge Distillation with Diverse Peers

4. Learning Recommender Systems with Implicit Feedback via Soft Target Enhancement

5. On the Efficacy of Knowledge Distillation