Abstract
Object-Centric Learning (OCL) can discover objects in images or videos by simply reconstructing the input. For better object discovery, representative OCL methods reconstruct the input as its Variational Autoencoder (VAE) intermediate representation, which suppresses pixel noises and promotes object separability by discretizing continuous super-pixels with template features. However, treating features as units overlooks their composing attributes, thus impeding model generalization; indexing features with scalar numbers loses attribute-level similarities and differences, thus hindering model convergence. We propose \textit{Grouped Discrete Representation} (GDR) for OCL. We decompose features into combinatorial attributes via organized channel grouping, and compose these attributes into discrete representation via tuple indexes. Experiments show that our GDR improves both Transformer- and Diffusion-based OCL methods consistently on various datasets. Visualizations show that our GDR captures better object separability.
Abstract (translated)
对象中心学习(OCL)可以通过简单地重构输入来发现图像或视频中的对象。为了更好地发现对象,典型的OCL方法将输入重建为其变分自编码器(VAE)的中间表示形式,通过用模板特征对连续超像素进行离散化处理,以抑制像素噪声并促进对象可分离性。然而,将特征视为单元会忽略它们的组成属性,从而阻碍模型泛化;使用标量数索引特征则会丢失属性级别的相似性和差异性,进而妨碍模型收敛。我们提出了用于OCL的\textit{分组离散表示}(GDR)。通过组织化的通道分组将特征分解为组合属性,并通过元组索引将其组成离散表示形式。实验表明,我们的GDR在各种数据集上一致提高了基于Transformer和扩散模型的OCL方法的性能。可视化结果表明,我们的GDR能够更好地捕捉对象可分离性。
URL
https://arxiv.org/abs/2411.02299