Leaping Into Memories: Space-Time Deep Feature Synthesis

Abstract
Abstract (translated)
URL
PDF

Abstract

The success of deep learning models has led to their adaptation and adoption by prominent video understanding methods. The majority of these approaches encode features in a joint space-time modality for which the inner workings and learned representations are difficult to visually interpret. We propose LEArned Preconscious Synthesis (LEAPS), an architecture-agnostic method for synthesizing videos from the internal spatiotemporal representations of models. Using a stimulus video and a target class, we prime a fixed space-time model and iteratively optimize a video initialized with random noise. We incorporate additional regularizers to improve the feature diversity of the synthesized videos as well as the cross-frame temporal coherence of motions. We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of spatiotemporal convolutional and attention-based architectures trained on Kinetics-400, which to the best of our knowledge has not been previously accomplished.

Abstract (translated)

深度学习模型的成功导致了其适应和采用 prominent 视频理解方法。这些方法大多数都采取了 joint 空间-时间模式,该模式的特点是内部运作和学习表示难以视觉解释。我们提出了LEarned Preconscious Synthesis(LEAPS),一种无架构方法,用于从模型内部时间和空间表示中合成视频。使用刺激视频和目标类,我们初始化一个固定的空间-时间模型,并迭代优化一个以随机噪声初始化的视频。我们引入了额外的正则化,以提高合成视频的特征多样性和交叉帧时间一致性。我们通过反转训练在Kinetics-400上训练的各种时间和空间卷积和注意力架构,评估了 LEAPS 的适用性。据我们所知,此前从未实现过。

URL

https://arxiv.org/abs/2303.09941

PDF

https://arxiv.org/pdf/2303.09941.pdf