Paper Reading AI Learner

Learning Optical Flow Field via Neural Ordinary Differential Equation

2025-06-03 18:30:14
Leyla Mirvakhabova, Hong Cai, Jisoo Jeong, Hanno Ackermann, Farhad Zanjani, Fatih Porikli

Abstract

Recent works on optical flow estimation use neural networks to predict the flow field that maps positions of one image to positions of the other. These networks consist of a feature extractor, a correlation volume, and finally several refinement steps. These refinement steps mimic the iterative refinements performed by classical optimization algorithms and are usually implemented by neural layers (e.g., GRU) which are recurrently executed for a fixed and pre-determined number of steps. However, relying on a fixed number of steps may result in suboptimal performance because it is not tailored to the input data. In this paper, we introduce a novel approach for predicting the derivative of the flow using a continuous model, namely neural ordinary differential equations (ODE). One key advantage of this approach is its capacity to model an equilibrium process, dynamically adjusting the number of compute steps based on the data at hand. By following a particular neural architecture, ODE solver, and associated hyperparameters, our proposed model can replicate the exact same updates as recurrent cells used in existing works, offering greater generality. Through extensive experimental analysis on optical flow benchmarks, we demonstrate that our approach achieves an impressive improvement over baseline and existing models, all while requiring only a single refinement step.

Abstract (translated)

最近关于光流估计的工作利用神经网络来预测将一张图像的位置映射到另一张图像位置的流动场。这些网络通常包含特征提取器、相关体积(correlation volume)以及几个后续的细化步骤。这些细化步骤模仿了经典优化算法中迭代改进的过程,并且通常通过递归执行一定次数的神经层(如GRU)来实现。然而,依赖于固定数量的步骤可能导致次优性能,因为它没有针对输入数据进行定制调整。 在本文中,我们提出了一种使用连续模型——即神经常微分方程(ODE)来预测流动场导数的新方法。这种方法的一个关键优势在于它能够建模一个平衡过程,并且可以根据手头的数据动态地调整计算步骤的数量。通过遵循特定的神经网络架构、ODE求解器以及相关的超参数,我们的提议模型可以复制现有工作中使用的递归单元所执行的确切更新操作,从而提供更大的通用性。 在广泛的光流基准实验分析中,我们展示了该方法比基线和现有的模型实现了显著改进,并且只需要一个细化步骤即可实现。

URL

https://arxiv.org/abs/2506.03290

PDF

https://arxiv.org/pdf/2506.03290.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot