Abstract
Temporal modeling is crucial for multi-frame human pose estimation. Most existing methods directly employ optical flow or deformable convolution to predict full-spectrum motion fields, which might incur numerous irrelevant cues, such as a nearby person or background. Without further efforts to excavate meaningful motion priors, their results are suboptimal, especially in complicated spatiotemporal interactions. On the other hand, the temporal difference has the ability to encode representative motion information which can potentially be valuable for pose estimation but has not been fully exploited. In this paper, we present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts and engages mutual information objectively to facilitate useful motion information disentanglement. To be specific, we design a multi-stage Temporal Difference Encoder that performs incremental cascaded learning conditioned on multi-stage feature difference sequences to derive informative motion representation. We further propose a Representation Disentanglement module from the mutual information perspective, which can grasp discriminative task-relevant motion signals by explicitly defining useful and noisy constituents of the raw motion features and minimizing their mutual information. These place us to rank No.1 in the Crowd Pose Estimation in Complex Events Challenge on benchmark dataset HiEve, and achieve state-of-the-art performance on three benchmarks PoseTrack2017, PoseTrack2018, and PoseTrack21.
Abstract (translated)
时间建模对于多帧人类姿态估计至关重要。大多数现有方法直接使用光学流或可变形卷积预测全光谱运动场,这可能会引入许多无关的线索,例如附近的人或背景。如果没有进一步挖掘有意义的运动先验,它们的结果就不太可能达到最优,特别是在复杂的时空交互中。另一方面,时间差异具有编码代表运动信息的能力,这些信息可能对姿态估计非常有价值,但尚未得到充分利用。在本文中,我们提出了一种新的多帧人类姿态估计框架,该框架使用每个帧之间的时间差异来建模动态上下文,并客观参与互信息以促进有用的运动信息分离。具体来说,我们设计了一个多级时间差异编码器,在多级特征差异序列的逐步迭代学习中产生有用的运动表示。我们还提出了一种表示分离模块,可以从互信息的角度提出,可以明确定义原始运动特征有用的噪声成分和最小化它们的互信息。这些使我们能够在复杂事件挑战中通过HiEve基准数据集获得复杂事件姿态估计任务中的最佳排名,并在三项基准数据集PoseTrack2017、PoseTrack2018和PoseTrack21上实现最先进的表现。
URL
https://arxiv.org/abs/2303.08475