Paper Reading AI Learner

Can a Robot Become a Movie Director? Learning Artistic Principles for Aerial Cinematography

2019-04-04 14:30:09
Mirko Gschwindt, Efe Camci, Rogerio Bonatti, Wenshan Wang, Erdal Kayacan, Sebastian Scherer

Abstract

Aerial filming is becoming more and more popular thanks to the recent advances in drone technology. It invites many intriguing, unsolved problems at the intersection of aesthetical and scientific challenges. In this work, we propose an intelligent agent which supervises motion planning of a filming drone based on aesthetical values of video shots using deep reinforcement learning. Unlike the current state-of-the-art approaches which mostly require explicit guidance by a human expert, our drone learns how to make favorable shot type selections by experience. We propose a learning scheme which exploits aesthetical features of retrospective shots in order to extract a desirable policy for better prospective shots. We train our agent in realistic AirSim simulations using both hand-crafted and human reward functions. We deploy the same agent on a real DJI M210 drone in order to test generalization capability of our approach to real world conditions. To evaluate the success of our approach in the end, we conduct a comprehensive user study in which participants rate the shots taken using our method and write comments about them.

Abstract (translated)

由于无人机技术的进步,空中拍摄越来越受欢迎。它在美学和科学挑战的交叉点上引发了许多有趣的、尚未解决的问题。在这项工作中,我们提出了一种智能代理,它利用深度强化学习的视频镜头的美学价值来监控无人机的运动规划。与目前最先进的方法不同,这种方法主要需要人类专家的明确指导,我们的无人机学习如何通过经验做出有利的射击类型选择。我们提出了一个学习方案,利用回顾性镜头的美学特征,为更好的前瞻性镜头提取理想的策略。我们使用手工制作和人工奖励功能在真实的Airsim模拟中训练我们的代理。我们在真实的DJI M210无人机上部署相同的代理,以测试我们的方法在真实环境中的泛化能力。为了最终评估我们的方法的成功性,我们进行了一个全面的用户研究,在这个研究中,参与者使用我们的方法对拍摄的照片进行评分,并写下他们的评论。

URL

https://arxiv.org/abs/1904.02579

PDF

https://arxiv.org/pdf/1904.02579.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot