Image Captioning With Visual-Semantic Double Attention-Reference-Cited by-同舟云学术

Image Captioning With Visual-Semantic Double Attention

Published:2019-02-28 Issue:1 Volume:15 Page:1-16
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

He Chen¹,Hu Haifeng¹^ORCID

Affiliation:

1. School of Electronics and Information Technology, Sun Yat-Sen University, Guangdong, People's Republic of China

Abstract

In this article, we propose a novel Visual-Semantic Double Attention (VSDA) model for image captioning. In our approach, VSDA consists of two parts: a modified visual attention model is used to extract sub-region image features, then a new SEmantic Attention (SEA) model is proposed to distill semantic features. Traditional attribute-based models always neglect the distinctive importance of each attribute word and fuse all of them into recurrent neural networks, resulting in abundant irrelevant semantic features. In contrast, at each timestep, our model selects the most relevant word that aligns with current context. In other words, the real power of VSDA lies in the ability of not only leveraging semantic features but also eliminating the influence of irrelevant attribute words to make the semantic guidance more precise. Furthermore, our approach solves the problem that visual attention models cannot boost generating non-visual words. Considering that visual and semantic features are complementary to each other, our model can leverage both of them to strengthen the generations of visual and non-visual words. Extensive experiments are conducted on famous datasets: MS COCO and Flickr30k. The results show that VSDA outperforms other methods and achieves promising performance.

Funder

Science and Technology Program of Guangzhou of China

Fundamental Research Funds for the Central Universities of China

National Natural Science Foundation of China

Natural Science Foundation of Guangdong Province

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3292058

Reference32 articles.