Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation

Author:

Hu Xiaobo1ORCID,Lin Youfang2ORCID,Fan Hehe3ORCID,Wang Shuo2ORCID,Wu Zhihao2ORCID,Lv Kai2ORCID

Affiliation:

1. Beijing Key Laboratory of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China

2. Beijing Key Laboratory of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing China

3. College of Computer Science and Technology, Zhejiang University, zhejiang China

Abstract

Given an object of interest, visual navigation aims to reach the object’s location based on a sequence of partial observations. To this end, an agent needs to (1) acquire specific knowledge about the relations of object categories in the world during training and (2) locate the target object based on the pre-learned object category relations and its trajectory in the current unseen environment. In this article, we propose a Category Relation Graph (CRG) to learn the knowledge of object category layout relations and a Temporal-Spatial-Region attention (TSR) architecture to perceive the long-term spatial-temporal dependencies of objects, aiding navigation. We establish CRG to learn prior knowledge of object layout and deduce the positions of specific objects. Subsequently, we propose the TSR architecture to capture relationships among objects in temporal, spatial, and regions within observation trajectories. Specifically, we implement a Temporal attention module (T) to model the temporal structure of the observation sequence, implicitly encoding historical moving or trajectory information. Then, a Spatial attention module (S) uncovers the spatial context of the current observation objects based on CRG and past observations. Last, a Region attention module (R) shifts the attention to the target-relevant region. Leveraging the visual representation extracted by our method, the agent accurately perceives the environment and easily learns a superior navigation policy. Experiments on AI2-THOR demonstrate that our CRG-TSR method significantly outperforms existing methods in both effectiveness and efficiency. The supplementary material includes the code and will be publicly available.

Funder

National Natural Science Foundation of China

Aeronautical Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Reference73 articles.

1. Mobile robot navigation using neural networks and nonmetrical environmental models

2. Vision based MAV navigation in unknown and unstructured environments

3. Probabilistic Appearance Based Navigation and Loop Closing

4. A solution to the simultaneous localization and map building (SLAM) problem

5. Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, and Shankar Sastry. 2003. Autonomous helicopter flight via reinforcement learning. In Advances in Neural Information Processing Systems. MIT Press, 799–806. Retrieved from https://proceedings.neurips.cc/paper/2003/hash/b427426b8acd2c2e53827970f2c2f526-Abstract.html

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3