Paper Reading AI Learner

RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing

2023-09-19 06:32:19
Jiahang Li, Yikang Zhang, Peng Yun, Guangliang Zhou, Qijun Chen, Rui Fan

Abstract

The recent advancements in deep convolutional neural networks have shown significant promise in the domain of road scene parsing. Nevertheless, the existing works focus primarily on freespace detection, with little attention given to hazardous road defects that could compromise both driving safety and comfort. In this paper, we introduce RoadFormer, a novel Transformer-based data-fusion network developed for road scene parsing. RoadFormer utilizes a duplex encoder architecture to extract heterogeneous features from both RGB images and surface normal information. The encoded features are subsequently fed into a novel heterogeneous feature synergy block for effective feature fusion and recalibration. The pixel decoder then learns multi-scale long-range dependencies from the fused and recalibrated heterogeneous features, which are subsequently processed by a Transformer decoder to produce the final semantic prediction. Additionally, we release SYN-UDTIRI, the first large-scale road scene parsing dataset that contains over 10,407 RGB images, dense depth images, and the corresponding pixel-level annotations for both freespace and road defects of different shapes and sizes. Extensive experimental evaluations conducted on our SYN-UDTIRI dataset, as well as on three public datasets, including KITTI road, CityScapes, and ORFD, demonstrate that RoadFormer outperforms all other state-of-the-art networks for road scene parsing. Specifically, RoadFormer ranks first on the KITTI road benchmark. Our source code, created dataset, and demo video are publicly available at mias.group/RoadFormer.

Abstract (translated)

最近的深度学习卷积神经网络的进步表明,在道路场景解析领域具有巨大的潜力。然而,现有的工作主要关注边界检测,而忽视了可能危及驾驶安全和舒适的危险道路缺陷。在本文中,我们介绍了路 former,这是一个专门为道路场景解析而开发的Transformer基数据融合网络。路 former利用双重编码器架构从RGB图像和表面正则信息中提取不同类型的特征。编码器特征随后被注入到一个 novel 的异质特征协同作用块,以有效地特征融合和重新校准。像素解码器随后从融合和重新校准的异质特征中学习多尺度的长期依赖关系,这些依赖关系随后由Transformer解码器处理,以产生最终的语义预测。此外,我们发布了SYN-UDTIRI,是第一个大规模的道路场景解析数据集,其中包含超过10,407张RGB图像、深度密度图像,以及不同形状和大小的自由空间和道路缺陷的像素级注释。我们对SYN-UDTIRI数据集进行了广泛的实验评估,并与其他三个公共数据集,包括KITTI道路、 CityScapes 和 ORFD,证明了路 former在道路场景解析领域比其他先进的网络表现更好。具体来说,路 former在KITTI道路基准测试中排名第一。我们的源代码、创建的数据集和演示视频均公开可在mias.group/RoadFormer网站上可用。

URL

https://arxiv.org/abs/2309.10356

PDF

https://arxiv.org/pdf/2309.10356.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot