Temporal-Spatial Redundancy Reduction in Video Sequences: A Motion-Based Entropy-Driven Attention Approach-Reference-Cited by-同舟云学术

Temporal-Spatial Redundancy Reduction in Video Sequences: A Motion-Based Entropy-Driven Attention Approach

Published:2025-03-21 Issue:4 Volume:10 Page:192
ISSN:2313-7673
Container-title:Biomimetics
language:en
Short-container-title:Biomimetics

Author:

Yuan Ye¹^ORCID,Wu Baolei¹,Mo Zifan²,Liu Weiye¹,Hong Ji¹,Li Zongdao¹,Liu Jian¹^ORCID,Liu Na¹

Affiliation:

1. Institute of Machine Intelligence, University of Shanghai for Science and Technology, Shanghai 200093, China

2. School of Automation and Electronic Information, Xiangtan University, Xiangtan 411105, China

Abstract

The existence of redundant video frames results in a substantial waste of computational resources during video-understanding tasks. Frame sampling is a crucial technique in improving resource utilization. However, existing sampling strategies typically adopt fixed-frame selection, which lacks flexibility in handling different action categories. In this paper, inspired by the neural mechanism of the human visual pathway, we propose an effective and interpretable frame-sampling method called Entropy-Guided Motion Enhancement Sampling (EGMESampler), which can remove redundant spatio-temporal information in videos. Our fundamental motivation is that motion information is an important signal that drives us to adaptively select frames from videos. Thus, we first perform motion modeling in EGMESampler to extract motion information from irrelevant backgrounds. Then, we design an entropy-based dynamic sampling strategy based on motion information to ensure that the sampled frames can cover important information in videos. Finally, we perform attention operations on the motion information and sampled frames to enhance the motion expression of the sampled frames and remove redundant spatial background information. Our EGMESampler can be embedded in existing video processing algorithms, and experiments on five benchmark datasets demonstrate its effectiveness compared to previous fixed-sampling strategies, as well as its generalizability across different video models and datasets.

Funder

National Natural Science Foundation of China

Pujiang Talents Plan of Shanghai

Artificial Intelligence Innovation and Development Special Fund of Shanghai

Publisher

MDPI AG

Link

https://www.mdpi.com/2313-7673/10/4/192/pdf

Reference42 articles.

1. Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study;Borji;IEEE Trans. Image Process.,2012

2. High-throughput classification of clinical populations from natural viewing eye movements;Tseng;J. Neurol.,2013

3. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015, January 7–9). Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the International Conference on Machine Learning, PMLR, Lille, France.

4. Perception-oriented video saliency detection via spatio-temporal attention analysis;Zhong;Neurocomputing,2016

5. Zhi, Y., Tong, Z., Wang, L., and Wu, G. (2021, January 11–17). Mgsampler: An explainable sampling strategy for video action recognition. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada.