Comparing Correspondences: Video Prediction with Correspondence-wise Losses

Abstract
Abstract (translated)
URL
PDF

Abstract

Today's image prediction methods struggle to change the locations of objects in a scene, producing blurry images that average over the many positions they might occupy. In this paper, we propose a simple change to existing image similarity metrics that makes them more robust to positional errors: we match the images using optical flow, then measure the visual similarity of corresponding pixels. This change leads to crisper and more perceptually accurate predictions, and can be used with any image prediction network. We apply our method to predicting future frames of a video, where it obtains strong performance with simple, off-the-shelf architectures.

Abstract (translated)

URL

https://arxiv.org/abs/2104.09498

PDF

https://arxiv.org/pdf/2104.09498.pdf