Paper Reading AI Learner

RealNet: Combining Optimized Object Detection with Information Fusion Depth Estimation Co-Design Method on IoT

2022-04-24 08:35:55
Zhuohao Li, Fandi Gou, Qixin De, Leqi Ding, Yuanhang Zhang, Yunze Cai

Abstract

Depth Estimation and Object Detection Recognition play an important role in autonomous driving technology under the guidance of deep learning artificial intelligence. We propose a hybrid structure called RealNet: a co-design method combining the model-streamlined recognition algorithm, the depth estimation algorithm with information fusion, and deploying them on the Jetson-Nano for unmanned vehicles with monocular vision sensors. We use ROS for experiment. The method proposed in this paper is suitable for mobile platforms with high real-time request. Innovation of our method is using information fusion to compensate the problem of insufficient frame rate of output image, and improve the robustness of target detection and depth estimation under monocular vision.Object Detection is based on YOLO-v5. We have simplified the network structure of its DarkNet53 and realized a prediction speed up to 0.01s. Depth Estimation is based on the VNL Depth Estimation, which considers multiple geometric constraints in 3D global space. It calculates the loss function by calculating the deviation of the virtual normal vector VN and the label, which can obtain deeper depth information. We use PnP fusion algorithm to solve the problem of insufficient frame rate of depth map output. It solves the motion estimation depth from three-dimensional target to two-dimensional point based on corner feature matching, which is faster than VNL calculation. We interpolate VNL output and PnP output to achieve information fusion. Experiments show that this can effectively eliminate the jitter of depth information and improve robustness. At the control end, this method combines the results of target detection and depth estimation to calculate the target position, and uses a pure tracking control algorithm to track it.

Abstract (translated)

URL

https://arxiv.org/abs/2204.11216

PDF

https://arxiv.org/pdf/2204.11216.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot