RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing

Abstract
Abstract (translated)
URL
PDF

Abstract

The recent advancements in deep convolutional neural networks have shown significant promise in the domain of road scene parsing. Nevertheless, the existing works focus primarily on freespace detection, with little attention given to hazardous road defects that could compromise both driving safety and comfort. In this paper, we introduce RoadFormer, a novel Transformer-based data-fusion network developed for road scene parsing. RoadFormer utilizes a duplex encoder architecture to extract heterogeneous features from both RGB images and surface normal information. The encoded features are subsequently fed into a novel heterogeneous feature synergy block for effective feature fusion and recalibration. The pixel decoder then learns multi-scale long-range dependencies from the fused and recalibrated heterogeneous features, which are subsequently processed by a Transformer decoder to produce the final semantic prediction. Additionally, we release SYN-UDTIRI, the first large-scale road scene parsing dataset that contains over 10,407 RGB images, dense depth images, and the corresponding pixel-level annotations for both freespace and road defects of different shapes and sizes. Extensive experimental evaluations conducted on our SYN-UDTIRI dataset, as well as on three public datasets, including KITTI road, CityScapes, and ORFD, demonstrate that RoadFormer outperforms all other state-of-the-art networks for road scene parsing. Specifically, RoadFormer ranks first on the KITTI road benchmark. Our source code, created dataset, and demo video are publicly available at mias.group/RoadFormer.

Abstract (translated)

最近的深度学习卷积神经网络的进步表明,在道路场景解析领域具有巨大的潜力。然而,现有的工作主要关注边界检测,而忽视了可能危及驾驶安全和舒适的危险道路缺陷。在本文中,我们介绍了路 former,这是一个专门为道路场景解析而开发的Transformer基数据融合网络。路 former利用双重编码器架构从RGB图像和表面正则信息中提取不同类型的特征。编码器特征随后被注入到一个 novel 的异质特征协同作用块,以有效地特征融合和重新校准。像素解码器随后从融合和重新校准的异质特征中学习多尺度的长期依赖关系,这些依赖关系随后由Transformer解码器处理,以产生最终的语义预测。此外,我们发布了SYN-UDTIRI,是第一个大规模的道路场景解析数据集,其中包含超过10,407张RGB图像、深度密度图像,以及不同形状和大小的自由空间和道路缺陷的像素级注释。我们对SYN-UDTIRI数据集进行了广泛的实验评估,并与其他三个公共数据集,包括KITTI道路、 CityScapes 和 ORFD,证明了路 former在道路场景解析领域比其他先进的网络表现更好。具体来说,路 former在KITTI道路基准测试中排名第一。我们的源代码、创建的数据集和演示视频均公开可在mias.group/RoadFormer网站上可用。

URL

https://arxiv.org/abs/2309.10356

PDF

https://arxiv.org/pdf/2309.10356.pdf

RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing

Abstract

Abstract (translated)

URL

PDF Copy

PDF