Paper Reading AI Learner

Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos

2025-06-18 17:59:38
Kaifeng Zhang, Baoyu Li, Kris Hauser, Yunzhu Li

Abstract

Modeling the dynamics of deformable objects is challenging due to their diverse physical properties and the difficulty of estimating states from limited visual information. We address these challenges with a neural dynamics framework that combines object particles and spatial grids in a hybrid representation. Our particle-grid model captures global shape and motion information while predicting dense particle movements, enabling the modeling of objects with varied shapes and materials. Particles represent object shapes, while the spatial grid discretizes the 3D space to ensure spatial continuity and enhance learning efficiency. Coupled with Gaussian Splattings for visual rendering, our framework achieves a fully learning-based digital twin of deformable objects and generates 3D action-conditioned videos. Through experiments, we demonstrate that our model learns the dynamics of diverse objects -- such as ropes, cloths, stuffed animals, and paper bags -- from sparse-view RGB-D recordings of robot-object interactions, while also generalizing at the category level to unseen instances. Our approach outperforms state-of-the-art learning-based and physics-based simulators, particularly in scenarios with limited camera views. Furthermore, we showcase the utility of our learned models in model-based planning, enabling goal-conditioned object manipulation across a range of tasks. The project page is available at this https URL .

Abstract (translated)

模拟可变形物体的动力学是一个挑战,因为它们具有多样的物理属性,并且从有限的视觉信息中估计状态也十分困难。我们通过一个结合了对象粒子和空间网格的混合表示的神经动力框架来应对这些挑战。我们的粒子-网格模型能够捕获全局形状和运动信息,同时预测密集的粒子运动,从而可以对具有不同形状和材料的对象进行建模。在这个模型中,粒子代表物体的形状,而空间网格则将3D空间离散化,以确保空间连续性并提高学习效率。结合高斯Splattings用于视觉渲染,我们的框架能够实现可变形对象的完全基于学习的数字孪生,并生成3D动作条件视频。通过实验,我们展示了模型可以从机器人与物体交互时的稀疏视图RGB-D记录中学习不同种类物体(如绳索、布料、填充动物玩具和纸袋)的动力学特性,同时在类别层面推广到未见过的实例上。我们的方法在视角有限的情况下超过了基于学习的和物理引擎的模拟器的最佳性能。此外,我们还展示了所学模型在基于模型规划中的实用性,使目标导向的对象操作跨各种任务成为可能。该项目页面可以在以下链接访问:[项目网页](https://这个URL/)。

URL

https://arxiv.org/abs/2506.15680

PDF

https://arxiv.org/pdf/2506.15680.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot