Paper Reading AI Learner

Deep AutoEncoder-based Lossy Geometry Compression for Point Clouds

2019-04-18 02:44:50
Wei Yan, Yiting shao, Shan Liu, Thomas H Li, Zhu Li, Ge Li

Abstract

Point cloud is a fundamental 3D representation which is widely used in real world applications such as autonomous driving. As a newly-developed media format which is characterized by complexity and irregularity, point cloud creates a need for compression algorithms which are more flexible than existing codecs. Recently, autoencoders(AEs) have shown their effectiveness in many visual analysis tasks as well as image compression, which inspires us to employ it in point cloud compression. In this paper, we propose a general autoencoder-based architecture for lossy geometry point cloud compression. To the best of our knowledge, it is the first autoencoder-based geometry compression codec that directly takes point clouds as input rather than voxel grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previously unseen media contents and media formats, meanwhile achieving competitive performance. Our architecture consists of a pointnet-based encoder, a uniform quantizer, an entropy estimation block and a nonlinear synthesis transformation module. In lossy geometry compression of point cloud, results show that the proposed method outperforms the test model for categories 1 and 3 (TMC13) published by MPEG-3DG group on the 125th meeting, and on average a 73.15\% BD-rate gain is achieved.

Abstract (translated)

点云是一种基本的三维表示方法,在自动驾驶等实际应用中得到了广泛应用。点云作为一种新开发的具有复杂性和不规则性的媒体格式,需要比现有的编解码器更灵活的压缩算法。最近,自动编码器(aes)在许多视觉分析任务和图像压缩中都显示出了其有效性,这激励我们将其应用于点云压缩。本文提出了一种基于自动编码器的无损几何点云压缩的通用结构。据我们所知,它是第一个基于自动编码器的几何压缩编解码器,直接将点云作为输入,而不是体素网格或图像集合。与手工编解码器相比,这种方法能够更快地适应以前看不见的媒体内容和媒体格式,同时获得具有竞争力的性能。我们的架构由一个基于点网的编码器、一个均匀量化器、一个熵估计块和一个非线性综合转换模块组成。在点云有损几何压缩中,结果表明,该方法优于MPEG-3DG集团在第125次会议上发布的1类和3类(TMC13)测试模型,平均获得73.15%的bd速率增益。

URL

https://arxiv.org/abs/1905.03691

PDF

https://arxiv.org/pdf/1905.03691.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot