Paper Reading AI Learner

Affordance Blending Networks

2024-04-24 05:07:36
Hakan Aktas, Yukie Nagai, Minoru Asada, Erhan Oztop, Emre Ugur

Abstract

Affordances, a concept rooted in ecological psychology and pioneered by James J. Gibson, have emerged as a fundamental framework for understanding the dynamic relationship between individuals and their environments. Expanding beyond traditional perceptual and cognitive paradigms, affordances represent the inherent effect and action possibilities that objects offer to the agents within a given context. As a theoretical lens, affordances bridge the gap between effect and action, providing a nuanced understanding of the connections between agents' actions on entities and the effect of these actions. In this study, we propose a model that unifies object, action and effect into a single latent representation in a common latent space that is shared between all affordances that we call the affordance space. Using this affordance space, our system is able to generate effect trajectories when action and object are given and is able to generate action trajectories when effect trajectories and objects are given. In the experiments, we showed that our model does not learn the behavior of each object but it learns the affordance relations shared by the objects that we call equivalences. In addition to simulated experiments, we showed that our model can be used for direct imitation in real world cases. We also propose affordances as a base for Cross Embodiment transfer to link the actions of different robots. Finally, we introduce selective loss as a solution that allows valid outputs to be generated for indeterministic model inputs.

Abstract (translated)

Affordances,这个概念源于生态心理学,是由詹姆斯·J·吉布森(James J. Gibson)先驱性地提出的,已成为理解个体与其环境之间动态关系的坚实基础。它超越了传统的感知和认知范式,代表物体在特定环境中提供的潜在效果和行动可能性。作为一个理论透镜,affordances在效果和行为之间搭建了桥梁,提供了实体中代理商行动对实体和这些行动的影响的细微理解。在这项研究中,我们提出了一个将物体、行为和效果统一为单个潜在表示的模型,称为affordance空间。利用这个affordance空间,我们的系统能够在给定动作和物体时生成效果轨迹,能够在给定效果轨迹和物体时生成行为轨迹。在实验中,我们证明了我们的模型不仅学习了每个物体的行为,还学习了我们称之为等价物的物体之间的affordance关系。除了模拟实验之外,我们还证明了我们的模型可以在现实世界 case 直接仿写。最后,我们提出了affordance作为跨身体转移的基础,将不同机器人的行动联系起来。此外,我们还引入了选择性损失作为解决方案,允许为不确定模型输入生成有效的输出。

URL

https://arxiv.org/abs/2404.15648

PDF

https://arxiv.org/pdf/2404.15648.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot