Paper Reading AI Learner

Extensible Grounding of Speech for Robot Instruction

2018-07-31 14:31:17
Jonathan Connell

Abstract

Spoken language is a convenient interface for commanding a mobile robot. Yet for this to work a number of base terms must be grounded in perceptual and motor skills. We detail the language processing used on our robot ELI and explain how this grounding is performed, how it interacts with user gestures, and how it handles phenomena such as anaphora. More importantly, however, there are certain concepts which the robot cannot be preprogrammed with, such as the names of various objects in a household or the nature of specific tasks it may be requested to perform. In these cases it is vital that there exist a method for extending the grounding, essentially "learning by being told". We describe how this was successfully implemented for learning new nouns and verbs in a tabletop setting. Creating this language learning kernel may be the last explicit programming the robot ever needs - the core mechanism could eventually be used for imparting a vast amount of knowledge, much as a child learns from its parents and teachers.

Abstract (translated)

口语是指挥移动机器人的便捷界面。然而,为了实现这一目标,许多基础术语必须以感知和运动技能为基础。我们详细介绍了机器人ELI上使用的语言处理,并解释了这种接地的执行方式,它与用户手势的交互方式,以及它如何处理回指等现象。然而,更重要的是,存在机器人不能预先编程的某些概念,例如家庭中各种物体的名称或者可能被请求执行的特定任务的性质。在这些情况下,至关重要的是存在一种扩展基础的方法,主要是“通过被告知学习”。我们描述了如何成功地在桌面设置中学习新名词和动词。创建这种语言学习内核可能是机器人需要的最后一次显式编程 - 核心机制最终可用于传授大量知识,就像孩子从其父母和老师那里学习一样。

URL

https://arxiv.org/abs/1807.11838

PDF

https://arxiv.org/pdf/1807.11838.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot