Paper Reading AI Learner

Learning Optical Flow from Event Camera with Rendered Dataset

2023-03-20 10:44:32
Xinglong Luo, Kunming Luo, Ao Luo, Zhengning Wang, Ping Tan, Shuaicheng Liu

Abstract

We study the problem of estimating optical flow from event cameras. One important issue is how to build a high-quality event-flow dataset with accurate event values and flow labels. Previous datasets are created by either capturing real scenes by event cameras or synthesizing from images with pasted foreground objects. The former case can produce real event values but with calculated flow labels, which are sparse and inaccurate. The later case can generate dense flow labels but the interpolated events are prone to errors. In this work, we propose to render a physically correct event-flow dataset using computer graphics models. In particular, we first create indoor and outdoor 3D scenes by Blender with rich scene content variations. Second, diverse camera motions are included for the virtual capturing, producing images and accurate flow labels. Third, we render high-framerate videos between images for accurate events. The rendered dataset can adjust the density of events, based on which we further introduce an adaptive density module (ADM). Experiments show that our proposed dataset can facilitate event-flow learning, whereas previous approaches when trained on our dataset can improve their performances constantly by a relatively large margin. In addition, event-flow pipelines when equipped with our ADM can further improve performances.

Abstract (translated)

我们研究从事件摄像机估计光学流动的一个问题。一个重要的问题是如何构建高质量的事件-flow数据集,具有准确的事件值和流标签。以前的数据集是通过事件摄像机捕捉实际场景或通过粘贴前景物体合成的。前者可以产生真实的事件值,但使用计算得到的流标签,这些标签稀疏且不准确。后者可以生成密度较高的流标签,但拼接的事件往往容易出错。在本研究中,我们提议使用计算机图形模型渲染一个物理正确的事件-flow数据集。特别地,我们使用Blender创建室内和室外的三维场景,具有丰富的场景内容变化。其次,我们包括多种相机运动,以虚拟捕捉、产生图像和准确的流标签。第三,我们渲染高质量的高帧率视频,用于准确的事件。渲染数据集可以调整事件密度,在此基础上我们进一步引入了自适应密度模块(ADM)。实验表明,我们提出的数据集可以促进事件-flow学习,而以前在训练我们的数据集时的方法可以 constantly 通过相对较大的改善其性能。此外,配备我们的ADM的事件-flow管道可以进一步改善性能。

URL

https://arxiv.org/abs/2303.11011

PDF

https://arxiv.org/pdf/2303.11011.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot