Paper Reading AI Learner

Recover: A Neuro-Symbolic Framework for Failure Detection and Recovery

2024-03-31 17:54:22
Cristina Cornelio, Mohammed Diab

Abstract

Recognizing failures during task execution and implementing recovery procedures is challenging in robotics. Traditional approaches rely on the availability of extensive data or a tight set of constraints, while more recent approaches leverage large language models (LLMs) to verify task steps and replan accordingly. However, these methods often operate offline, necessitating scene resets and incurring in high costs. This paper introduces Recover, a neuro-symbolic framework for online failure identification and recovery. By integrating ontologies, logical rules, and LLM-based planners, Recover exploits symbolic information to enhance the ability of LLMs to generate recovery plans and also to decrease the associated costs. In order to demonstrate the capabilities of our method in a simulated kitchen environment, we introduce OntoThor, an ontology describing the AI2Thor simulator setting. Empirical evaluation shows that OntoThor's logical rules accurately detect all failures in the analyzed tasks, and that Recover considerably outperforms, for both failure detection and recovery, a baseline method reliant solely on LLMs.

Abstract (translated)

在机器人领域,在任务执行过程中识别失败并实施恢复程序是非常具有挑战性的。传统方法依赖于大量数据或一组约束条件的可用性,而更现代的方法则利用大型语言模型(LLMs)来验证任务步骤并相应地重新规划。然而,这些方法通常需要离线操作,导致场景重置并产生高昂的成本。本文介绍了一个名为Recover的神经符号框架,用于在线故障识别和恢复。通过整合语义信息、逻辑规则和基于LLM的计划器,Recover利用符号信息增强LLMs生成恢复计划的能力,并降低相关成本。为了在模拟厨房环境中展示我们方法的性能,我们引入了OntoThor,一个描述AI2Thor仿真器设置的语义论。实证评估表明,OntoThor的逻辑规则准确地检测了分析任务中的所有故障,而Recover在故障检测和恢复方面都显著优于仅依赖LLM的基线方法。

URL

https://arxiv.org/abs/2404.00756

PDF

https://arxiv.org/pdf/2404.00756.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot