Paper Reading AI Learner

Towards a Robust Aerial Cinematography Platform: Localizing and Tracking Moving Targets in Unstructured Environments

2019-04-04 02:37:05
Rogerio Bonatti, Cherie Ho, Wenshan Wang, Sanjiban Choudhury, Sebastian Scherer

Abstract

The use of drones for aerial cinematography has revolutionized several applications and industries requiring live and dynamic camera viewpoints such as entertainment, sports, and security. However, safely controlling a drone while filming a moving target usually requires multiple expert human operators; hence the need for an autonomous cinematographer. Current approaches have severe real-life limitations such as requiring scripted scenes that can be solved offline, high-precision motion-capture systems or GPS tags to localize targets, and prior maps of the environment to avoid obstacles and plan for occlusion. In this work, we overcome such limitations and propose a complete system for aerial cinematography that combines: (1) a visual pose estimation algorithm for target localization; (2) a real-time incremental 3D signed-distance map algorithm for occlusion and safety computation; and (3) a real-time camera motion planner that optimizes smoothness, collisions, occlusions and artistic guidelines. We evaluate robustness and real-time performance in series of field experiments and simulations by tracking dynamic targets moving through unknown, unstructured environments. Finally, we verify that despite removing previous limitations, our system still matches state-of-the-art performance. Videos of the system in action can be seen at https://youtu.be/ZE9MnCVmumc

Abstract (translated)

无人机在航空摄影中的应用已经彻底改变了一些需要实时和动态摄像机视角的应用和行业,如娱乐、体育和安全。然而,在拍摄移动目标的同时安全地控制无人机通常需要多个专业的人工操作人员;因此需要一个自主摄影师。当前的方法具有严重的现实限制,例如需要可以离线解决的脚本场景、高精度运动捕获系统或GPS标记来定位目标,以及以前的环境地图以避免障碍并计划遮挡。

URL

https://arxiv.org/abs/1904.02319

PDF

https://arxiv.org/pdf/1904.02319.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot