Abstract
How can a robot efficiently extract a desired object from a shelf when it is fully occluded by other objects? Prior works propose geometric approaches for this problem but do not consider object semantics. Shelves in pharmacies, restaurant kitchens, and grocery stores are often organized such that semantically similar objects are placed close to one another. Can large language models (LLMs) serve as semantic knowledge sources to accelerate robotic mechanical search in semantically arranged environments? With Semantic Spatial Search on Shelves (S^4), we use LLMs to generate affinity matrices, where entries correspond to semantic likelihood of physical proximity between objects. We derive semantic spatial distributions by synthesizing semantics with learned geometric constraints. S^4 incorporates Optical Character Recognition (OCR) and semantic refinement with predictions from ViLD, an open-vocabulary object detection model. Simulation experiments suggest that semantic spatial search reduces the search time relative to pure spatial search by an average of 24% across three domains: pharmacy, kitchen, and office shelves. A manually collected dataset of 100 semantic scenes suggests that OCR and semantic refinement improve object detection accuracy by 35%. Lastly, physical experiments in a pharmacy shelf suggest 47.1% improvement over pure spatial search. Supplementary material can be found at this https URL.
Abstract (translated)
当机器人在货架上无法提取被其他物体完全遮挡的目标时,如何高效地提取它?先前的研究提出了几何方法来解决这一问题,但并未考虑对象语义。在药房、餐厅厨房和超市货架上,通常将语义相似的物体放置在相邻的位置。大型语言模型(LLM)可以充当语义知识源,以加速在语义排序环境中的机器人机械搜索?使用货架上的语义空间搜索(S^4),我们使用LLM生成亲和力矩阵,其中 entries对应于两个物体之间语义亲近的可能性。通过合成语义与学习到的几何约束的摘要,我们推导出了语义空间分布。S^4将光学字符识别(OCR)和语义优化与VIDD(一个开放词汇表对象检测模型的预测)集成在一起。模拟实验表明,语义空间搜索相对于纯空间搜索平均减少了搜索时间24%,包括三个领域:药房、厨房和办公室货架。手动收集的100个语义场景数据集表明,OCR和语义优化可以提高物体检测精度35%。最后,在药房货架上进行的物理实验表明,相对于纯空间搜索,提高了47.1%。补充材料可以在此处找到。
URL
https://arxiv.org/abs/2302.12915