Abstract
Visual monitoring of industrial assembly tasks is critical for preventing equipment damage due to procedural errors and ensuring worker safety. Although commercial solutions exist, they typically require rigid workspace setups or the application of visual markers to simplify the problem. We introduce ViMAT, a novel AI-driven system for real-time visual monitoring of assembly tasks that operates without these constraints. ViMAT combines a perception module that extracts visual observations from multi-view video streams with a reasoning module that infers the most likely action being performed based on the observed assembly state and prior task knowledge. We validate ViMAT on two assembly tasks, involving the replacement of LEGO components and the reconfiguration of hydraulic press molds, demonstrating its effectiveness through quantitative and qualitative analysis in challenging real-world scenarios characterized by partial and uncertain visual observations. Project page: this https URL
Abstract (translated)
工业装配任务的视觉监控对于预防因程序错误导致的设备损坏以及确保工人安全至关重要。尽管市面上存在一些商用解决方案,但这些方案通常需要固定的工作空间设置或应用视觉标记来简化问题。我们介绍了一种名为ViMAT的新AI驱动系统,该系统能够在没有上述限制的情况下实现装配任务的实时视觉监控。ViMAT结合了感知模块和推理模块:前者从多视角视频流中提取视觉观察数据;后者则基于所见装配状态及先前的任务知识推断出最有可能正在执行的动作。 我们通过两项装配任务验证了ViMAT的有效性,包括乐高组件替换与液压压模的重新配置。这两项任务在部分和不确定的视觉观测等具有挑战性的现实场景中均显示出其优异性能,并通过定量分析和定性分析进行了展示。 项目页面:[请在此处插入具体网址]
URL
https://arxiv.org/abs/2506.15285