Paper Reading AI Learner

Hand-Object Interaction Controller : Deep Reinforcement Learning for Reconstructing Interactions with Physics

2024-05-04 14:32:13
Haoyu Hu, Xinyu Yi, Zhe Cao, Jun-Hai Yong, Feng Xu

Abstract

Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at this https URL.

Abstract (translated)

手操作物体是我们日常生活中的重要交互动作。我们通过一种新颖的深度强化学习方法,利用物理原理,重构了单个RGBD相机来捕捉这个动作。首先,我们提出了对象补偿控制,建立了直接物体控制,使得网络训练更加稳定。同时,通过利用补偿力和扭矩,我们将简单的点接触模型升级为更物理上合理的表面接触模型,进一步提高了重构精度和物理正确性。实验结果表明,在没有使用任何启发式物理规则的情况下,这项工作仍然成功地涉及了物理在重构手-物体交互过程中的应用,这些复杂动作很难通过深度强化学习来模仿。我们的代码和数据可在此处访问:https://www. this URL。

URL

https://arxiv.org/abs/2405.02676

PDF

https://arxiv.org/pdf/2405.02676.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot