Abstract
In this study, Disentanglement in Difference(DiD) is proposed to address the inherent inconsistency between the statistical independence of latent variables and the goal of semantic disentanglement in disentanglement representation learning. Conventional disentanglement methods achieve disentanglement representation by improving statistical independence among latent variables. However, the statistical independence of latent variables does not necessarily imply that they are semantically unrelated, thus, improving statistical independence does not always enhance disentanglement performance. To address the above issue, DiD is proposed to directly learn semantic differences rather than the statistical independence of latent variables. In the DiD, a Difference Encoder is designed to measure the semantic differences; a contrastive loss function is established to facilitate inter-dimensional comparison. Both of them allow the model to directly differentiate and disentangle distinct semantic factors, thereby resolving the inconsistency between statistical independence and semantic disentanglement. Experimental results on the dSprites and 3DShapes datasets demonstrate that the proposed DiD outperforms existing mainstream methods across various disentanglement metrics.
Abstract (translated)
在这项研究中,提出了差异中的解缠(DiD)来解决潜在变量的统计独立性和语义解缠目标之间的内在不一致性问题。传统的解缠方法通过提高潜在变量之间的统计独立性来实现解缠表示。然而,潜在变量的统计独立性并不一定意味着它们在语义上是无关的,因此,改善统计独立性并不总是能增强解缠性能。为了解决上述问题,DiD被提出直接学习语义差异而不是关注潜在变量的统计独立性。 在DiD中,设计了一个差异编码器来测量语义差异;建立了对比损失函数以促进跨维度比较。这两种方法使模型能够直接区分和解开不同的语义因素,从而解决了统计独立性和语义解缠之间的不一致性问题。在dSprites和3DShapes数据集上的实验结果表明,提出的DiD在各种解缠指标上均优于现有的主流方法。
URL
https://arxiv.org/abs/2502.03123