Abstract
The increasing demand for sustainable textile recycling requires robust automation solutions capable of handling deformable garments and detecting foreign objects in cluttered environments. This work presents a digital twin driven robotic sorting system that integrates grasp prediction, multi modal perception, and semantic reasoning for real world textile classification. A dual arm robotic cell equipped with RGBD sensing, capacitive tactile feedback, and collision-aware motion planning autonomously separates garments from an unsorted basket, transfers them to an inspection zone, and classifies them using state of the art Visual Language Models (VLMs). We benchmark nine VLM s from five model families on a dataset of 223 inspection scenarios comprising shirts, socks, trousers, underwear, foreign objects (including garments outside of the aforementioned classes), and empty scenes. The evaluation assesses per class accuracy, hallucination behavior, and computational performance under practical hardware constraints. Results show that the Qwen model family achieves the highest overall accuracy (up to 87.9 %), with strong foreign object detection performance, while lighter models such as Gemma3 offer competitive speed accuracy trade offs for edge deployment. A digital twin combined with MoveIt enables collision aware path planning and integrates segmented 3D point clouds of inspected garments into the virtual environment for improved manipulation reliability. The presented system demonstrates the feasibility of combining semantic VLM reasoning with conventional grasp detection and digital twin technology for scalable, autonomous textile sorting in realistic industrial settings.
Abstract (translated)
对可持续纺织品回收日益增长的需求要求能够处理可变形衣物并在混乱环境中检测异物的强大的自动化解决方案。本文介绍了一种基于数字孪生驱动的机器人分拣系统,该系统集成了抓取预测、多模态感知和语义推理,用于现实世界中的纺织品分类。此系统配备了一个双臂机器人单元,配有RGBD传感器、电容式触觉反馈以及碰撞意识运动规划,能够从未分类的篮子中自主分离衣物,并将其转移到检查区域,在此基础上使用最先进的视觉语言模型(VLMs)对其进行分类。 我们针对五个模型家族中的九种VLM模型在包含223个检测场景的数据集上进行了基准测试。这些场景包括衬衫、袜子、裤子、内衣,以及不属于上述类别的异物和空场景。评估内容涵盖了各类别准确率、幻觉行为及在实际硬件限制下的计算性能表现。结果显示,在整体准确性(最高可达87.9%)方面,Qwen模型家族表现出最佳效果,并且具有强大的异物检测能力;而像Gemma3这样的轻量级模型则提供了边缘部署时速度和精度之间的良好权衡。 结合数字孪生与MoveIt能够实现碰撞感知路径规划,并将检查后的衣物的分割三维点云集成到虚拟环境中,从而提高了操纵可靠性。所提出的系统展示了语义VLM推理技术与传统抓取检测及数字孪生技术相结合,在现实工业环境中的大规模、自主纺织品分拣任务中具有可行性。
URL
https://arxiv.org/abs/2603.05230