Paper Reading AI Learner

Digital Twin Driven Textile Classification and Foreign Object Recognition in Automated Sorting Systems

2026-03-05 14:42:19
Serkan Ergun, Tobias Mitterer, Hubert Zangl

Abstract

The increasing demand for sustainable textile recycling requires robust automation solutions capable of handling deformable garments and detecting foreign objects in cluttered environments. This work presents a digital twin driven robotic sorting system that integrates grasp prediction, multi modal perception, and semantic reasoning for real world textile classification. A dual arm robotic cell equipped with RGBD sensing, capacitive tactile feedback, and collision-aware motion planning autonomously separates garments from an unsorted basket, transfers them to an inspection zone, and classifies them using state of the art Visual Language Models (VLMs). We benchmark nine VLM s from five model families on a dataset of 223 inspection scenarios comprising shirts, socks, trousers, underwear, foreign objects (including garments outside of the aforementioned classes), and empty scenes. The evaluation assesses per class accuracy, hallucination behavior, and computational performance under practical hardware constraints. Results show that the Qwen model family achieves the highest overall accuracy (up to 87.9 %), with strong foreign object detection performance, while lighter models such as Gemma3 offer competitive speed accuracy trade offs for edge deployment. A digital twin combined with MoveIt enables collision aware path planning and integrates segmented 3D point clouds of inspected garments into the virtual environment for improved manipulation reliability. The presented system demonstrates the feasibility of combining semantic VLM reasoning with conventional grasp detection and digital twin technology for scalable, autonomous textile sorting in realistic industrial settings.

Abstract (translated)

对可持续纺织品回收日益增长的需求要求能够处理可变形衣物并在混乱环境中检测异物的强大的自动化解决方案。本文介绍了一种基于数字孪生驱动的机器人分拣系统,该系统集成了抓取预测、多模态感知和语义推理,用于现实世界中的纺织品分类。此系统配备了一个双臂机器人单元,配有RGBD传感器、电容式触觉反馈以及碰撞意识运动规划,能够从未分类的篮子中自主分离衣物,并将其转移到检查区域,在此基础上使用最先进的视觉语言模型(VLMs)对其进行分类。 我们针对五个模型家族中的九种VLM模型在包含223个检测场景的数据集上进行了基准测试。这些场景包括衬衫、袜子、裤子、内衣,以及不属于上述类别的异物和空场景。评估内容涵盖了各类别准确率、幻觉行为及在实际硬件限制下的计算性能表现。结果显示,在整体准确性(最高可达87.9%)方面,Qwen模型家族表现出最佳效果,并且具有强大的异物检测能力;而像Gemma3这样的轻量级模型则提供了边缘部署时速度和精度之间的良好权衡。 结合数字孪生与MoveIt能够实现碰撞感知路径规划,并将检查后的衣物的分割三维点云集成到虚拟环境中,从而提高了操纵可靠性。所提出的系统展示了语义VLM推理技术与传统抓取检测及数字孪生技术相结合,在现实工业环境中的大规模、自主纺织品分拣任务中具有可行性。

URL

https://arxiv.org/abs/2603.05230

PDF

https://arxiv.org/pdf/2603.05230.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot