Abstract
Skeleton-based gait recognizers excel at modeling spatial configurations but often underuse explicit motion dynamics that are crucial under appearance changes. We introduce a plug-and-play Wavelet Feature Stream that augments any skeleton backbone with time-frequency dynamics of joint velocities. Concretely, per-joint velocity sequences are transformed by the continuous wavelet transform (CWT) into multi-scale scalograms, from which a lightweight multi-scale CNN learns discriminative dynamic cues. The resulting descriptor is fused with the backbone representation for classification, requiring no changes to the backbone architecture or additional supervision. Across CASIA-B, the proposed stream delivers consistent gains on strong skeleton backbones (e.g., GaitMixer, GaitFormer, GaitGraph) and establishes a new skeleton-based state of the art when attached to GaitMixer. The improvements are especially pronounced under covariate shifts such as carrying bags (BG) and wearing coats (CL), highlighting the complementarity of explicit time-frequency modeling and standard spatio-temporal encoders.
Abstract (translated)
基于骨架的步态识别器擅长建模空间构型,但往往未能充分挖掘在外观变化下至关重要的显式运动动态。我们提出了一种即插即用的小波特征流,可通过关节速度的时频动态增强任意骨架主干网络。具体而言,对每个关节的速度序列进行连续小波变换(CWT),生成多尺度时频图,随后一个轻量级多尺度CNN从中学习判别性动态线索。所得描述符与主干网络表征融合用于分类,无需修改主干架构或额外监督。在CASIA-B数据集上,该特征流在多种强骨架主干网络(如GaitMixer、GaitFormer、GaitGraph)上均实现稳定提升,当与GaitMixer结合时更创下基于骨架方法的新纪录。尤其在携带包(BG)和穿外套(CL)等外观变化场景下,改进尤为显著,凸显了显式时频建模与标准时空编码器的互补性。
URL
https://arxiv.org/abs/2604.03002