Paper Reading AI Learner

Unifying Event-based Flow, Stereo and Depth Estimation via Feature Similarity Matching

2024-07-31 16:43:20
Pengjie Zhang, Lin Zhu, Lizhi Wang, Hua Huang

Abstract

As an emerging vision sensor, the event camera has gained popularity in various vision tasks such as optical flow estimation, stereo matching, and depth estimation due to its high-speed, sparse, and asynchronous event streams. Unlike traditional approaches that use specialized architectures for each specific task, we propose a unified framework, EventMatch, that reformulates these tasks as an event-based dense correspondence matching problem, allowing them to be solved with a single model by directly comparing feature similarities. By utilizing a shared feature similarities module, which integrates knowledge from other event flows via temporal or spatial interactions, and distinct task heads, our network can concurrently perform optical flow estimation from temporal inputs (e.g., two segments of event streams in the temporal domain) and stereo matching from spatial inputs (e.g., two segments of event streams from different viewpoints in the spatial domain). Moreover, we further demonstrate that our unified model inherently supports cross-task transfer since the architecture and parameters are shared across tasks. Without the need for retraining on each task, our model can effectively handle both optical flow and disparity estimation simultaneously. The experiment conducted on the DSEC benchmark demonstrates that our model exhibits superior performance in both optical flow and disparity estimation tasks, outperforming existing state-of-the-art methods. Our unified approach not only advances event-based models but also opens new possibilities for cross-task transfer and inter-task fusion in both spatial and temporal dimensions. Our code will be available later.

Abstract (translated)

作为新兴的视觉传感器,事件相机因其高速、稀疏和异步事件流而在各种视觉任务中获得了广泛的应用,如光流估计、立体匹配和深度估计。与传统方法不同,我们提出了一个统一框架EventMatch,将这些任务重新建模为基于事件的密集匹配问题,使得它们可以通过直接比较特征相似性来解决。通过利用共享特征相似性模块,该模块通过时间或空间交互整合了其他事件流的知识,以及具有不同任务头的网络,我们的网络可以同时从时间输入(例如,时间域中的两个事件段)进行光流估计,并从空间输入(例如,来自不同观点的两个事件段)进行立体匹配。此外,我们还进一步证明了我们的统一模型固有地支持跨任务转移,因为架构和参数在任务之间共享。无需在每个任务上进行重新训练,我们的模型可以有效地处理光流估计和差异估计。在DSEC基准上进行的实验证明,我们的模型在光流估计和差异估计任务中表现出卓越的性能,超过了现有最先进的方法。我们的统一方法不仅推动了基于事件的方法的发展,还开辟了在空间和时间维度上进行跨任务转移和任务融合的新可能性。我们的代码稍后可用。

URL

https://arxiv.org/abs/2407.21735

PDF

https://arxiv.org/pdf/2407.21735.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot