Paper Reading AI Learner

'Don't forget to put the milk back!' Dataset for Enabling Embodied Agents to Detect Anomalous Situations

2024-04-12 21:56:21
James F. Mullen Jr, Prasoon Goyal, Robinson Piramuthu, Michael Johnston, Dinesh Manocha, Reza Ghanadan

Abstract

Home robots intend to make their users lives easier. Our work assists in this goal by enabling robots to inform their users of dangerous or unsanitary anomalies in their home. Some examples of these anomalies include the user leaving their milk out, forgetting to turn off the stove, or leaving poison accessible to children. To move towards enabling home robots with these abilities, we have created a new dataset, which we call SafetyDetect. The SafetyDetect dataset consists of 1000 anomalous home scenes, each of which contains unsafe or unsanitary situations for an agent to detect. Our approach utilizes large language models (LLMs) alongside both a graph representation of the scene and the relationships between the objects in the scene. Our key insight is that this connected scene graph and the object relationships it encodes enables the LLM to better reason about the scene -- especially as it relates to detecting dangerous or unsanitary situations. Our most promising approach utilizes GPT-4 and pursues a categorization technique where object relations from the scene graph are classified as normal, dangerous, unsanitary, or dangerous for children. This method is able to correctly identify over 90% of anomalous scenarios in the SafetyDetect Dataset. Additionally, we conduct real world experiments on a ClearPath TurtleBot where we generate a scene graph from visuals of the real world scene, and run our approach with no modification. This setup resulted in little performance loss. The SafetyDetect Dataset and code will be released to the public upon this papers publication.

Abstract (translated)

家庭机器人旨在使使用者的生活更加便捷。我们的工作通过使机器人通知用户他们在家中的危险或不卫生的异常情况来实现这一目标。这些异常情况包括用户将牛奶放在桌子上,忘记关炉子,或者将毒物留给孩子们。为了实现具有这些能力的家庭机器人,我们创建了一个名为SafetyDetect的新数据集,我们称之为安全检测数据集。安全检测数据集包括1000个异常的家庭场景,每个场景都包含一个机器人可以检测到的不安全或不卫生的情况。我们的方法利用了大型语言模型(LLMs),并借助场景图和场景中物体的关系来表示场景。我们的关键见解是,这个连接的场景图和它编码的对象关系能够使LLM更好地理解场景,尤其是与检测危险或不卫生的情况有关的情况。 我们最具有前景的方法利用了GPT-4,并采用了一种分类技术,将场景图中的物体关系分类为正常、危险、不卫生或危险。这种方法在安全检测数据集中的异常场景中能够正确地识别超过90%的情况。此外,我们在ClearPath TurtleBot上进行了现实世界的实验,从现实世界的场景视觉中生成场景图,并运行我们的方法。这个设置结果几乎没有性能损失。安全检测数据集和代码将在本文发表时公开发布。

URL

https://arxiv.org/abs/2404.08827

PDF

https://arxiv.org/pdf/2404.08827.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot