Abstract
We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before $\alpha$ blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and $\alpha$ blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by $+8\%$ over the state of the art. Code and trained models will be released soon.
Abstract (translated)
我们介绍了一种名为 Contrastive Gaussian Clustering 的新方法,它可以从任何视角提供分割掩码,并实现场景的 3D 分割。最近的新视图合成工作展示了如何通过 3D 高斯云来建模场景的 appearance,以及如何在给定视角上投影高斯并在 $\alpha$ 融合后生成准确图像的方法。遵循这个例子,我们训练了一个模型,每个高斯还包括一个分割特征向量。这些特征向量可以用于 3D 场景分割,通过根据其特征向量聚类高斯;还可以用于生成 2D 分割掩码,通过在平面上投影高斯并在其分割特征上进行 $\alpha$ 融合。通过对比学习与空间正则化,我们的方法可以在不一致的 2D 分割掩码上进行训练,同时仍然能在所有视角上生成一致的分割掩码。此外,所得模型非常准确,预测掩码的 IoU 准确率提高了 $+8\%$ 以上。代码和训练好的模型不久将发布。
URL
https://arxiv.org/abs/2404.12784