Abstract
Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being extended to various online applications. In this paper, we investigate an efficient deep learning-based motion magnification model that runs in real time for full-HD resolution videos. Due to the specified network design of the prior art, i.e. inhomogeneous architecture, the direct application of existing neural architecture search methods is complicated. Instead of automatic search, we carefully investigate the architecture module by module for its role and importance in the motion magnification task. Two key findings are 1) Reducing the spatial resolution of the latent motion representation in the decoder provides a good trade-off between computational efficiency and task quality, and 2) surprisingly, only a single linear layer and a single branch in the encoder are sufficient for the motion magnification task. Based on these findings, we introduce a real-time deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X faster than the prior art while maintaining comparable quality.
Abstract (translated)
视频运动放大是一种通过捕捉和放大肉眼看不见的视频中的微妙运动来捕获和放大的技术。基于深度学习的先驱工作在保持与传统信号处理方法相比具有卓越的质量的同时,成功建模了运动放大问题。然而,它仍然落后于实时性能,无法扩展到各种在线应用程序。在本文中,我们研究了一个在高清分辨率视频上运行的实时深度学习运动放大模型。由于先前的网络设计是异构的,因此直接应用现有的神经网络架构搜索方法会非常复杂。我们不是通过自动搜索,而是仔细研究每个模块的架构模块,以了解其在运动放大任务中的作用和重要性。两个关键发现是 1) 降低编码器中潜在运动表示的空间分辨率可以实现计算效率和任务质量之间的良好平衡,以及 2) 令人惊讶的是,仅需要一个线性层和一个分支的编码器就可以完成运动放大任务。基于这些发现,我们引入了一个具有4.2X fewer FLOPs 和比先前技术快2.7X 的实时深度学习运动放大模型,同时保持相当的质量。
URL
https://arxiv.org/abs/2403.01898