Abstract
We introduce an on-ground Pedestrian World Model, a computational model that can predict how pedestrians move around an observer in the crowd on the ground plane, but from just the egocentric-views of the observer. Our model, InCrowdFormer, fully leverages the Transformer architecture by modeling pedestrian interaction and egocentric to top-down view transformation with attention, and autoregressively predicts on-ground positions of a variable number of people with an encoder-decoder architecture. We encode the uncertainties arising from unknown pedestrian heights with latent codes to predict the posterior distributions of pedestrian positions. We validate the effectiveness of InCrowdFormer on a novel prediction benchmark of real movements. The results show that InCrowdFormer accurately predicts the future coordination of pedestrians. To the best of our knowledge, InCrowdFormer is the first-of-its-kind pedestrian world model which we believe will benefit a wide range of egocentric-view applications including crowd navigation, tracking, and synthesis.
Abstract (translated)
我们介绍了一个地面行人世界模型,这是一个计算模型,可以从观察者的自我意识角度预测在人群地面上观察行人如何移动,而无需考虑观察者的自我意识。我们的模型是In Crowd former,它充分利用了Transformer架构,通过注意力机制建模行人互动和从自我意识到高层次视图的转变,并通过编码-解码架构预测了变量数量人的地面位置。我们使用隐编码器来编码由于未知行人高度引起的不确定性,以预测行人位置的后分布。我们验证了In Crowd former在真实运动预测基准上的 effectiveness。结果表明,In Crowd former准确预测了行人的未来发展协调。据我们所知,In Crowd former是独一无二的行人世界模型,我们相信它将为包括人群导航、跟踪和合成在内的多种自我意识角度应用带来益处。
URL
https://arxiv.org/abs/2303.09534