Abstract
Most learning-based approaches to category-level 6D pose estimation are design around normalized object coordinate space (NOCS). While being successful, NOCS-based methods become inaccurate and less robust when handling objects of a category containing significant intra-category shape variations. This is because the object coordinates induced by global and rigid alignment of objects are semantically incoherent, making the coordinate regression hard to learn and generalize. We propose Semantically-aware Object Coordinate Space (SOCS) built by warping-and-aligning the objects guided by a sparse set of keypoints with semantically meaningful correspondence. SOCS is semantically coherent: Any point on the surface of a object can be mapped to a semantically meaningful location in SOCS, allowing for accurate pose and size estimation under large shape variations. To learn effective coordinate regression to SOCS, we propose a novel multi-scale coordinate-based attention network. Evaluations demonstrate that our method is easy to train, well-generalizing for large intra-category shape variations and robust to inter-object occlusions.
Abstract (translated)
大多数基于学习的类别级别的6D姿态估计方法都围绕着 normalization 对象坐标空间 (NOCS) 设计。虽然这些方法都取得了成功,但在处理包含内部类别形状变异较大类别的物体时,NOCS 方法会变得不准确且不够稳健。这是因为由物体全球和固定对齐引起的对象坐标具有语义上的不一致性,这使得坐标回归很难学习和泛化。我们提出了一种语义化的 Object Coordinate Space (SOCS),通过稀疏的一组关键点以语义有意义的对应关系指导拉伸和对齐物体。SOCS 具有语义一致性:物体表面的任何点都可以映射到SOCS中的一个语义有意义的位置,从而实现在大型形状变异下准确的姿势和尺寸估计。为了学习有效地从SOCS中Regression,我们提出了一种新颖的多尺度坐标基注意力网络。评估表明,我们的方法易于训练,对于大型内部类别形状变异有很好的泛化能力,并且能够抵御外部物体遮挡。
URL
https://arxiv.org/abs/2303.10346