Paper Reading AI Learner

A Lightweight Optical Flow CNN -- Revisiting Data Fidelity and Regularization

2019-03-15 04:20:58
Tak-Wai Hui, Xiaoou Tang, Chen Change Loy

Abstract

Over four decades, the majority addresses the problem of optical flow estimation using variational methods. With the advance of machine learning, some recent works have attempted to address the problem using convolutional neural network (CNN) and have showed promising results. FlowNet2, the state-of-the-art CNN, requires over 160M parameters to achieve accurate flow estimation. Our LiteFlowNet2 outperforms FlowNet2 on Sintel and KITTI benchmarks, while being 25.3 times smaller in the footprint and 3.1 times faster in the running speed. LiteFlowNet2 which is built on the foundation laid by conventional methods has marked a milestone to achieve the corresponding roles as data fidelity and regularization in variational methods. We present an effective flow inference approach at each pyramid level through a novel lightweight cascaded network. It provides high flow estimation accuracy through early correction with seamless incorporation of descriptor matching. A novel flow regularization layer is used to ameliorate the issue of outliers and vague flow boundaries through a novel feature-driven local convolution. Our network also owns an effective structure for pyramidal feature extraction and embraces feature warping rather than image warping as practiced in FlowNet2. Comparing to our earlier work, LiteFlowNet2 improves the optical flow accuracy on Sintel clean pass by 24%, Sintel final pass by 8.9%, KITTI 2012 by 16.8%, and KITTI 2015 by 17.5%. Our network protocol and trained models will be made publicly available on https://github.com/twhui/LiteFlowNet2 .

Abstract (translated)

40多年来,大多数研究者使用变分法来解决光流量估计问题。随着机器学习的进步,近年来的一些研究尝试用卷积神经网络(CNN)来解决这一问题,并取得了良好的效果。美国有线电视新闻网(CNN)最先进的流量网2需要超过160m的参数来实现精确的流量估计。在Sintel和Kitti基准测试中,我们的LiteFlowNet2的性能优于FlowNet2,但其占地面积小了25.3倍,运行速度快了3.1倍。LITEFRONET2是建立在传统方法基础上的一个里程碑,它在变分方法中实现了数据保真度和正则化的相应作用。通过一种新型的轻量级级联网络,提出了一种在每个金字塔层次上有效的流量推理方法。它通过早期校正和描述符匹配的无缝结合提供了较高的流量估计精度。通过一种新的特征驱动局部卷积方法,利用一种新的流正则化层来改善异常值和模糊流边界问题。我们的网络还拥有一个有效的金字塔特征提取结构,并接受特征扭曲,而不是像flownet2中那样的图像扭曲。与我们之前的工作相比,LiteFlowNet2提高了Sintel清洁通道的光流精度24%,Sintel最终通道提高了8.9%,Kitti 2012提高了16.8%,Kitti 2015提高了17.5%。我们的网络协议和经过培训的模型将在https://github.com/twhui/liteflownet2上公开。

URL

https://arxiv.org/abs/1903.07414

PDF

https://arxiv.org/pdf/1903.07414.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot