Paper Reading AI Learner

Autonomous drone cinematographer: Using artistic principles to create smooth, safe, occlusion-free trajectories for aerial filming

2018-08-28 22:28:20
Rogerio Bonatti, Yanfu Zhang, Sanjiban Choudhury, Wenshan Wang, Sebastian Scherer

Abstract

Autonomous aerial cinematography has the potential to enable automatic capture of aesthetically pleasing videos without requiring human intervention, empowering individuals with the capability of high-end film studios. Current approaches either only handle off-line trajectory generation, or offer strategies that reason over short time horizons and simplistic representations for obstacles, which result in jerky movement and low real-life applicability. In this work we develop a method for aerial filming that is able to trade off shot smoothness, occlusion, and cinematography guidelines in a principled manner, even under noisy actor predictions. We present a novel algorithm for real-time covariant gradient descent that we use to efficiently find the desired trajectories by optimizing a set of cost functions. Experimental results show that our approach creates attractive shots, avoiding obstacles and occlusion 65 times over 1.25 hours of flight time, re-planning at 5 Hz with a 10 s time horizon. We robustly film human actors, cars and bicycles performing different motion among obstacles, using various shot types.

Abstract (translated)

自主航拍摄影技术有可能在不需要人工干预的情况下自动捕捉美观的视频,使个人具备高端电影制片厂的能力。当前的方法要么仅处理离线轨迹生成,要么提供在短时间范围内推理的策略和对障碍物的简单表示,这导致不稳定的运动和低的实际适用性。在这项工作中,我们开发了一种空中拍摄方法,即使在嘈杂的演员预测下,也能够以原则的方式权衡镜头平滑度,遮挡和电影摄影指南。我们提出了一种用于实时协变梯度下降的新算法,我们通过优化一组成本函数来有效地找到所需的轨迹。实验结果表明,我们的方法创造了有吸引力的镜头,在1.25小时的飞行时间内避开了障碍物和遮挡65次,在5秒时重新规划了10秒的时间范围。我们使用各种镜头类型,强有力地拍摄人类演员,汽车和自行车在障碍物之间进行不同运动。

URL

https://arxiv.org/abs/1808.09563

PDF

https://arxiv.org/pdf/1808.09563.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot