Paper Reading AI Learner

Prior-free Category-level Pose Estimation with Implicit Space Transformation

2023-03-23 17:48:12
Jianhui Liu, Yukang Chen, Xiaoqing Ye, Xiaojuan Qi

Abstract

Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category. Thanks to prior deformation, which explicitly adapts a category-specific 3D prior (i.e., a 3D template) to a given object instance, prior-based methods attained great success and have become a major research stream. However, obtaining category-specific priors requires collecting a large amount of 3D models, which is labor-consuming and often not accessible in practice. This motivates us to investigate whether priors are necessary to make prior-based methods effective. Our empirical study shows that the 3D prior itself is not the credit to the high performance. The keypoint actually is the explicit deformation process, which aligns camera and world coordinates supervised by world-space 3D models (also called canonical space). Inspired by these observation, we introduce a simple prior-free implicit space transformation network, namely IST-Net, to transform camera-space features to world-space counterparts and build correspondence between them in an implicit manner without relying on 3D priors. Besides, we design camera- and world-space enhancers to enrich the features with pose-sensitive information and geometrical constraints, respectively. Albeit simple, IST-Net becomes the first prior-free method that achieves state-of-the-art performance, with top inference speed on the REAL275 dataset. Our code and models will be publicly available.

Abstract (translated)

类别级别的6D姿态估计旨在预测特定类别中未观测到物体的姿態和大小。由于预先变形方法,它 explicitly 适应特定类别的3D先验(即3D模板)以给定物体实例,这种方法取得了巨大的成功并成为主要研究流。然而,获得特定类别先验需要收集大量的3D模型,这在实际应用中往往难以实现。这激励我们研究是否有必要使用先验方法来使先验方法有效。我们的实证研究表明,3D先验本身并不是高性能的归功于。关键是 explicit 变形过程,它由世界空间3D模型监督 align 相机和世界坐标,(也叫做标准空间)。受到这些观察的启发,我们引入了一个简单的没有先验的隐含空间变换网络,即 IST-Net,它将相机空间特征转换为世界空间对应物,并在它们之间建立联系,而无需依赖3D先验。此外,我们设计相机和世界空间增强器,以丰富特征,具有姿態敏感性信息和几何约束。尽管简单,IST-Net成为第一个没有先验方法实现先进的性能,在真实275数据集上具有最快推理速度的方法。我们的代码和模型将公开可用。

URL

https://arxiv.org/abs/2303.13479

PDF

https://arxiv.org/pdf/2303.13479.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot