Paper Reading AI Learner

PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot

2024-04-07 17:31:53
Shenbagaraj Kannapiran, Sreenithy Chandran, Suren Jayasuriya, Spring Berman

Abstract

The study of non-line-of-sight (NLOS) imaging is growing due to its many potential applications, including rescue operations and pedestrian detection by self-driving cars. However, implementing NLOS imaging on a moving camera remains an open area of research. Existing NLOS imaging methods rely on time-resolved detectors and laser configurations that require precise optical alignment, making it difficult to deploy them in dynamic environments. This work proposes a data-driven approach to NLOS imaging, PathFinder, that can be used with a standard RGB camera mounted on a small, power-constrained mobile robot, such as an aerial drone. Our experimental pipeline is designed to accurately estimate the 2D trajectory of a person who moves in a Manhattan-world environment while remaining hidden from the camera's field-of-view. We introduce a novel approach to process a sequence of dynamic successive frames in a line-of-sight (LOS) video using an attention-based neural network that performs inference in real-time. The method also includes a preprocessing selection metric that analyzes images from a moving camera which contain multiple vertical planar surfaces, such as walls and building facades, and extracts planes that return maximum NLOS information. We validate the approach on in-the-wild scenes using a drone for video capture, thus demonstrating low-cost NLOS imaging in dynamic capture environments.

Abstract (translated)

非直线光学(NLOS)成像的研究越来越多,因为其许多潜在应用,包括救援行动和自动驾驶汽车中的行人检测。然而,在运动相机上实现NLOS成像仍然是一个研究热点。现有的NLOS成像方法依赖于时间分辨率检测器和激光配置,需要精确的光学对齐,这使得它们难以在动态环境中部署。本文提出了一种数据驱动的NLOS成像方法,PathFinder,可用于安装在小型、受功率限制的移动机器人上的标准RGB相机,如无人机。我们的实验流程旨在准确估计在曼哈顿环境中的移动人员的2D轨迹,同时保持从相机视场范围外隐藏。我们引入了一种基于注意力的神经网络来处理序列动态连续帧的LOS视频的方法。该方法还包括一个预处理选择度量,用于分析运动相机中包含多个垂直平面表面(如墙和建筑立面)的图像,并提取返回最大NLOS信息的平面。我们在野外场景中使用无人机进行视频捕捉,从而验证了该方法,证明了在动态捕捉环境中低成本的NLOS成像。

URL

https://arxiv.org/abs/2404.05024

PDF

https://arxiv.org/pdf/2404.05024.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot