Abstract
Implicit neural representations (INR) has found successful applications across diverse domains. To employ INR in real-life, it is important to speed up training. In the field of INR for video applications, the state-of-the-art approach employs grid-type parametric encoding and successfully achieves a faster encoding speed in comparison to its predecessors. However, the grid usage, which does not consider the video's dynamic nature, leads to redundant use of trainable parameters. As a result, it has significantly lower parameter efficiency and higher bitrate compared to NeRV-style methods that do not use a parametric encoding. To address the problem, we propose Neural Video representation with Temporally coherent Modulation (NVTM), a novel framework that can capture dynamic characteristics of video. By decomposing the spatio-temporal 3D video data into a set of 2D grids with flow information, NVTM enables learning video representation rapidly and uses parameter efficiently. Our framework enables to process temporally corresponding pixels at once, resulting in the fastest encoding speed for a reasonable video quality, especially when compared to the NeRV-style method, with a speed increase of over 3 times. Also, it remarks an average of 1.54dB/0.019 improvements in PSNR/LPIPS on UVG (Dynamic) (even with 10% fewer parameters) and an average of 1.84dB/0.013 improvements in PSNR/LPIPS on MCL-JCV (Dynamic), compared to previous grid-type works. By expanding this to compression tasks, we demonstrate comparable performance to video compression standards (H.264, HEVC) and recent INR approaches for video compression. Additionally, we perform extensive experiments demonstrating the superior performance of our algorithm across diverse tasks, encompassing super resolution, frame interpolation and video inpainting. Project page is this https URL.
Abstract (translated)
隐式神经表示(INR)在多个领域中取得了成功应用。为了在实际生活中使用 INR,加快训练速度至关重要。在 INR 用于视频应用的领域内,最先进方法采用网格类型的参数编码,并且相比其前驱实现了更快的编码速度。然而,这种网格使用方式没有考虑视频动态性质,导致可训练参数冗余使用。因此,在参数效率和比特率方面,这种方法比不使用参数编码的 NeRV 风格的方法要低得多。 为了解决这一问题,我们提出了具有时间一致性调制(Neural Video representation with Temporally coherent Modulation, NVTM)的新框架,该框架能够捕捉视频的动态特性。通过将时空三维视频数据分解成一组带有流信息的二维网格,NVTM 使得学习视频表示变得更快,并且能更有效地使用参数。我们的框架允许一次性处理时间对应的像素,从而在保持合理视频质量的前提下实现了最快的编码速度,尤其是在与 NeRV 风格的方法相比时,速度快了超过三倍。 此外,在 UVG(动态)上,NVTM 在 PSNR/LPIPS 上分别提升了 1.54 dB/0.019(即使参数减少了 10%),而在 MCL-JCV(动态)上则提高了 1.84 dB/0.013 的 PSNR/LPIPS。通过扩展到压缩任务中,我们展示了 NVTM 方法在视频压缩标准(H.264, HEVC)和最近的 INR 视频压缩方法中的性能相当。 此外,我们进行了广泛的实验,展示我们的算法在超分辨率、帧插值和视频修复等多样任务上具有优越的表现。项目页面链接为:https://thisisprojecturl.com(请将此URL替换为您项目的实际链接)。
URL
https://arxiv.org/abs/2505.00335