Abstract
Glass surface ubiquitous in both daily life and professional environments presents a potential threat to vision-based systems, such as robot and drone navigation. To solve this challenge, most recent studies have shown significant interest in Video Glass Surface Detection (VGSD). We observe that objects in the reflection (or transmission) layer appear farther from the glass surfaces. Consequently, in video motion scenarios, the notable reflected (or transmitted) objects on the glass surface move slower than objects in non-glass regions within the same spatial plane, and this motion inconsistency can effectively reveal the presence of glass surfaces. Based on this observation, we propose a novel network, named MVGD-Net, for detecting glass surfaces in videos by leveraging motion inconsistency cues. Our MVGD-Net features three novel modules: the Cross-scale Multimodal Fusion Module (CMFM) that integrates extracted spatial features and estimated optical flow maps, the History Guided Attention Module (HGAM) and Temporal Cross Attention Module (TCAM), both of which further enhances temporal features. A Temporal-Spatial Decoder (TSD) is also introduced to fuse the spatial and temporal features for generating the glass region mask. Furthermore, for learning our network, we also propose a large-scale dataset, which comprises 312 diverse glass scenarios with a total of 19,268 frames. Extensive experiments demonstrate that our MVGD-Net outperforms relevant state-of-the-art methods.
Abstract (translated)
玻璃表面在日常生活中和专业环境中普遍存在,这给基于视觉的系统(如机器人和无人机导航)带来了潜在威胁。为了解决这一挑战,最近的研究对视频玻璃面检测(VGSD)表现出了浓厚的兴趣。我们观察到,在反射层或透射层中的物体似乎距离玻璃更远。因此,在视频运动场景中,相较于同一平面内的非玻璃区域里的对象,玻璃表面上的显著反射(或透射)物体移动得较慢,这种运动不一致性可以有效地揭示玻璃表面的存在。 基于这一观察,我们提出了一种名为MVGD-Net的新网络,用于通过利用运动不一致线索来检测视频中的玻璃面。我们的MVGD-Net具有三个新颖模块:跨尺度多模态融合模块(CMFM),该模块整合了提取的空间特征和估计的光流图;历史引导注意模块(HGAM)以及时间交叉注意模块(TCAM),这两个模块进一步增强了时序特征。此外,还引入了一个时空解码器(TSD),用于融合空间和时间特征以生成玻璃区域掩模。 为了训练我们的网络,我们还提出了一套大规模的数据集,其中包括312种多样的玻璃场景,总计有19,268帧。广泛的实验表明,与相关最先进的方法相比,我们的MVGD-Net在性能上取得了优越的结果。
URL
https://arxiv.org/abs/2601.13715