Shuffled ImageNet Banks for Video Event Detection and Search-Reference-Cited by-同舟云学术

Shuffled ImageNet Banks for Video Event Detection and Search

Published:2020-05-31 Issue:2 Volume:16 Page:1-21
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Mettes Pascal¹^ORCID,Koelma Dennis C.¹,Snoek Cees G. M.¹

Affiliation:

1. University of Amsterdam, Amsterdam, the Netherlands

Abstract

This article aims for the detection and search of events in videos, where video examples are either scarce or even absent during training. To enable such event detection and search, ImageNet concept banks have shown to be effective. Rather than employing the standard concept bank of 1,000 ImageNet classes, we leverage the full 21,841-class dataset. We identify two problems with using the full dataset: (i) there is an imbalance between the number of examples per concept, and (ii) not all concepts are equally relevant for events. In this article, we propose to balance large-scale image hierarchies for pre-training. We shuffle concepts based on bottom-up and top-down operations to overcome the problems of example imbalance and concept relevance. Using this strategy, we arrive at the shuffled ImageNet bank, a concept bank with an order of magnitude more concepts compared to standard ImageNet banks. Compared to standard ImageNet pre-training, our shuffles result in more discriminative representations to train event models from the limited video event examples. For event search, the broad range of concepts enable a closer match between textual queries of events and concept detections in videos. Experimentally, we show the benefit of the proposed bank for event detection and event search, with state-of-the-art performance for both tasks on the challenging TRECVID Multimedia Event Detection and Ad-Hoc Video Search benchmarks.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3377875

Reference79 articles.

1. Minimally Needed Evidence for Complex Event Recognition in Unconstrained Videos

2. LIBSVM

3. Bi-level semantic representation analysis for multimedia event detection;Chang Xiaojun;ToC,2017

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval;ACM Transactions on Multimedia Computing, Communications, and Applications;2024-09-12

2. Transferable dual multi-granularity semantic excavating for partially relevant video retrieval;Image and Vision Computing;2024-09

3. Balancing Privacy and Utility in Surveillance Systems: An Overview;2023 International Conference on Platform Technology and Service (PlatCon);2023-08-16

4. Universal Prototype Transport for Zero-Shot Action Recognition and Localization;International Journal of Computer Vision;2023-07-19

5. Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning;Proceedings of the 30th ACM International Conference on Multimedia;2022-10-10