Abstract
Modeling the dynamics of deformable objects is challenging due to their diverse physical properties and the difficulty of estimating states from limited visual information. We address these challenges with a neural dynamics framework that combines object particles and spatial grids in a hybrid representation. Our particle-grid model captures global shape and motion information while predicting dense particle movements, enabling the modeling of objects with varied shapes and materials. Particles represent object shapes, while the spatial grid discretizes the 3D space to ensure spatial continuity and enhance learning efficiency. Coupled with Gaussian Splattings for visual rendering, our framework achieves a fully learning-based digital twin of deformable objects and generates 3D action-conditioned videos. Through experiments, we demonstrate that our model learns the dynamics of diverse objects -- such as ropes, cloths, stuffed animals, and paper bags -- from sparse-view RGB-D recordings of robot-object interactions, while also generalizing at the category level to unseen instances. Our approach outperforms state-of-the-art learning-based and physics-based simulators, particularly in scenarios with limited camera views. Furthermore, we showcase the utility of our learned models in model-based planning, enabling goal-conditioned object manipulation across a range of tasks. The project page is available at this https URL .
Abstract (translated)
模拟可变形物体的动力学是一个挑战,因为它们具有多样的物理属性,并且从有限的视觉信息中估计状态也十分困难。我们通过一个结合了对象粒子和空间网格的混合表示的神经动力框架来应对这些挑战。我们的粒子-网格模型能够捕获全局形状和运动信息,同时预测密集的粒子运动,从而可以对具有不同形状和材料的对象进行建模。在这个模型中,粒子代表物体的形状,而空间网格则将3D空间离散化,以确保空间连续性并提高学习效率。结合高斯Splattings用于视觉渲染,我们的框架能够实现可变形对象的完全基于学习的数字孪生,并生成3D动作条件视频。通过实验,我们展示了模型可以从机器人与物体交互时的稀疏视图RGB-D记录中学习不同种类物体(如绳索、布料、填充动物玩具和纸袋)的动力学特性,同时在类别层面推广到未见过的实例上。我们的方法在视角有限的情况下超过了基于学习的和物理引擎的模拟器的最佳性能。此外,我们还展示了所学模型在基于模型规划中的实用性,使目标导向的对象操作跨各种任务成为可能。该项目页面可以在以下链接访问:[项目网页](https://这个URL/)。
URL
https://arxiv.org/abs/2506.15680