Paper Reading AI Learner

The ApolloScape Dataset for Autonomous Driving

2018-07-12 10:11:43
Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, Ruigang Yang

Abstract

Scene parsing aims to assign a class (semantic) label for each pixel in an image. It is a comprehensive analysis of an image. Given the rise of autonomous driving, pixel-accurate environmental perception is expected to be a key enabling technical piece. However, providing a large scale dataset for the design and evaluation of scene parsing algorithms, in particular for outdoor scenes, has been difficult. The per-pixel labelling process is prohibitively expensive, limiting the scale of existing ones. In this paper, we present a large-scale open dataset, ApolloScape, that consists of RGB videos and corresponding dense 3D point clouds. Comparing with existing datasets, our dataset has the following unique properties. The first is its scale, our initial release contains over 140K images - each with its per-pixel semantic mask, up to 1M is scheduled. The second is its complexity. Captured in various traffic conditions, the number of moving objects averages from tens to over one hundred. And the third is the 3D attribute, each image is tagged with high-accuracy pose information at cm accuracy and the static background point cloud has mm relative accuracy. We are able to label these many images by an interactive and efficient labelling pipeline that utilizes the high-quality 3D point cloud. Moreover, our dataset also contains different lane markings based on the lane colors and styles. We expect our new dataset can deeply benefit various autonomous driving related applications that include but not limited to 2D/3D scene understanding, localization, transfer learning, and driving simulation.

Abstract (translated)

场景解析旨在为图像中的每个像素分配类(语义)标签。它是对图像的综合分析。鉴于自动驾驶的兴起,像素精确的环境感知有望成为关键的技术部分。然而,提供用于场景解析算法的设计和评估的大规模数据集,特别是对于室外场景,是困难的。每像素标记过程非常昂贵,限制了现有标记的规模。在本文中,我们提出了一个大型的开放数据集ApolloScape,它由RGB视频和相应的密集3D点云组成。与现有数据集相比,我们的数据集具有以下唯一属性。第一个是它的规模,我们的初始版本包含超过140K的图像 - 每个图像都有每像素语义掩码,计划高达1M。第二是它的复杂性。在各种交通条件下捕获,移动物体的数量平均从几十到一百多。第三个是3D属性,每个图像都以cm精度标记高精度姿势信息,静态背景点云具有mm相对精度。我们能够通过使用高质量3D点云的交互式高效标签管道标记这些图像。此外,我们的数据集还包含基于车道颜色和样式的不同车道标记。我们希望我们的新数据集能够深入受益于各种自动驾驶相关应用,包括但不限于2D / 3D场景理解,定位,转移学习和驾驶模拟。

URL

https://arxiv.org/abs/1803.06184

PDF

https://arxiv.org/pdf/1803.06184.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot