Deep Dual-Modal Traffic Objects Instance Segmentation Method Using Camera and LIDAR Data for Autonomous Driving-Reference-Cited by-同舟云学术

Deep Dual-Modal Traffic Objects Instance Segmentation Method Using Camera and LIDAR Data for Autonomous Driving

Published:2020-10-09 Issue:20 Volume:12 Page:3274
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Geng Keke¹^ORCID,Dong Ge²,Yin Guodong¹,Hu Jingyu¹

Affiliation:

1. School of Mechanical Engineering, Southeast University, Nanjing 211189, China

2. Institute of Aeronautics and Astronautics, Tsinghua University, Beijing 100084, China

Abstract

Recent advancements in environmental perception for autonomous vehicles have been driven by deep learning-based approaches. However, effective traffic target detection in complex environments remains a challenging task. This paper presents a novel dual-modal instance segmentation deep neural network (DM-ISDNN) by merging camera and LIDAR data, which can be used to deal with the problem of target detection in complex environments efficiently based on multi-sensor data fusion. Due to the sparseness of the LIDAR point cloud data, we propose a weight assignment function that assigns different weight coefficients to different feature pyramid convolutional layers for the LIDAR sub-network. We compare and analyze the adaptations of early-, middle-, and late-stage fusion architectures in depth. By comprehensively considering the detection accuracy and detection speed, the middle-stage fusion architecture with a weight assignment mechanism, with the best performance, is selected. This work has great significance for exploring the best feature fusion scheme of a multi-modal neural network. In addition, we apply a mask distribution function to improve the quality of the predicted mask. A dual-modal traffic object instance segmentation dataset is established using a 7481 camera and LIDAR data pairs from the KITTI dataset, with 79,118 manually annotated instance masks. To the best of our knowledge, there is no existing instance annotation for the KITTI dataset with such quality and volume. A novel dual-modal dataset, composed of 14,652 camera and LIDAR data pairs, is collected using our own developed autonomous vehicle under different environmental conditions in real driving scenarios, for which a total of 62,579 instance masks are obtained using semi-automatic annotation method. This dataset can be used to validate the detection performance under complex environmental conditions of instance segmentation networks. Experimental results on the dual-modal KITTI Benchmark demonstrate that DM-ISDNN using middle-stage data fusion and the weight assignment mechanism has better detection performance than single- and dual-modal networks with other data fusion strategies, which validates the robustness and effectiveness of the proposed method. Meanwhile, compared to the state-of-the-art instance segmentation networks, our method shows much better detection performance, in terms of AP and F1 score, on the dual-modal dataset collected under complex environmental conditions, which further validates the superiority of our method.

Funder

National Natural Science Foundation of China

National Natural Science Foundation of Jiangsu Province

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/12/20/3274/pdf

Reference30 articles.

1. Zhu, J.S., Ke, S., Sen, J., Lin, W.D., Hou, X.X., Liu, B.Z., and Qiu, G.P. (2018). Bidirectional Long Short-Term Memory Network for Vehicle Behavior Recognition. Remote Sens., 10.

2. Stateczny, A., Kazimierski, W., Gronska-Sledz, D., and Motyl, W. (2019). The Empirical Application of Automotive 3D Radar Sensor for Target Detection for an Autonomous Surface Vehicle’s Navigation. Remote Sens., 11.

3. Mask R-CNN;Kaiming;IEEE Trans. Pattern Anal. Mach. Intell.,2020

4. Fu, C.Y., Shvets, M., and Berg, A.C. (2020, September 13). RetinaMask: Learning to Predict Masks Improves State-of-the-Art Single-Shot Detection for Free. Available online: https://arxiv.org/abs/1703.06870.

5. Bolya, D., Zhou, C., Xiao, F., and Lee, Y.J. (November, January 27). YOLACT: Real-time Instance Segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea.

Cited by 30 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Framework for Flow Pattern Analysis and Identification Based on Dual-Domain Feature Extraction and Deep Learning;IEEE Sensors Journal;2025-04-15

2. Adaptive Video Bitrate Allocation for Remotely Operated Vehicles (ROV);Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering;2025

3. Challenges and Innovations in 3D Object Recognition: The Integration of LiDAR and Camera Sensors for Autonomous Applications;Transportation Research Procedia;2025

4. Acoustic tomography temperature distribution reconstruction based on dual-domain feature fusion network;International Journal of Heat and Mass Transfer;2024-12

5. Innovations in 3D Object Detection: A Comprehensive Review of Methods, Sensor Fusion, and Future Directions;IECE Transactions on Sensing, Communication, and Control;2024-10-12