Paper Reading AI Learner

EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation

2025-06-04 02:55:04
Daikun Liu, Lei Cheng, Teng Wang, changyin Sun

Abstract

Recent learning-based methods for event-based optical flow estimation utilize cost volumes for pixel matching but suffer from redundant computations and limited scalability to higher resolutions for flow refinement. In this work, we take advantage of the complementarity between temporally dense feature differences of adjacent event frames and cost volume and present a lightweight event-based optical flow network (EDCFlow) to achieve high-quality flow estimation at a higher resolution. Specifically, an attention-based multi-scale temporal feature difference layer is developed to capture diverse motion patterns at high resolution in a computation-efficient manner. An adaptive fusion of high-resolution difference motion features and low-resolution correlation motion features is performed to enhance motion representation and model generalization. Notably, EDCFlow can serve as a plug-and-play refinement module for RAFT-like event-based methods to enhance flow details. Extensive experiments demonstrate that EDCFlow achieves better performance with lower complexity compared to existing methods, offering superior generalization.

Abstract (translated)

最近基于学习的方法用于事件驱动的光流估计时,虽然利用了成本体(cost volumes)来进行像素匹配,但仍然面临着冗余计算和难以扩展到更高分辨率的问题。在这项工作中,我们充分利用了相邻事件帧之间时间密集特征差与成本体之间的互补性,并提出了一种轻量级的基于事件的光流网络(EDCFlow),旨在实现高分辨率下的高质量光流估计。 具体而言,我们开发了一个基于注意力机制的多尺度时间特征差异层,以高效地捕捉不同运动模式在高分辨率下的表现。此外,还进行了一项自适应融合操作,将高分辨率差动运动特征与低分辨率相关运动特征相结合,以此增强对运动的表示和模型泛化能力。 值得注意的是,EDCFlow可以作为插件式的细化模块应用于类似RAFT的方法中,以提高光流细节的质量。通过广泛的实验,我们证明了相较于现有方法,EDCFlow在性能和复杂度之间实现了更好的平衡,并且具有更优越的泛化能力。

URL

https://arxiv.org/abs/2506.03512

PDF

https://arxiv.org/pdf/2506.03512.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot