Spatial-Temporal Residual Aggregation for High Resolution Video Inpainting

2021-11-05 15:50:31

Vishnu Sanjay Ramiya Srinivasan, Rui Ma, Qiang Tang, Zili Yi, Zhan Xu

arXiv_CV

Abstract
Abstract (translated)
URL
PDF

Abstract

Recent learning-based inpainting algorithms have achieved compelling results for completing missing regions after removing undesired objects in videos. To maintain the temporal consistency among the frames, 3D spatial and temporal operations are often heavily used in the deep networks. However, these methods usually suffer from memory constraints and can only handle low resolution videos. We propose STRA-Net, a novel spatial-temporal residual aggregation framework for high resolution video inpainting. The key idea is to first learn and apply a spatial and temporal inpainting network on the downsampled low resolution videos. Then, we refine the low resolution results by aggregating the learned spatial and temporal image residuals (details) to the upsampled inpainted frames. Both the quantitative and qualitative evaluations show that we can produce more temporal-coherent and visually appealing results than the state-of-the-art methods on inpainting high resolution videos.

Abstract (translated)

URL

https://arxiv.org/abs/2111.03574

PDF

https://arxiv.org/pdf/2111.03574.pdf