Abstract
Discriminative deep learning models with a linear+softmax final layer have a problem: the latent space only predicts the conditional probabilities $p(Y|X)$ but not the full joint distribution $p(Y,X)$, which necessitates a generative approach. The conditional probability cannot detect outliers, causing outlier sensitivity in softmax networks. This exacerbates model over-confidence impacting many problems, such as hallucinations, confounding biases, and dependence on large datasets. To address this we introduce a novel embedding constraint based on the Method of Moments (MoM). We investigate the use of polynomial moments ranging from 1st through 4th order hyper-covariance matrices. Furthermore, we use this embedding constraint to train an Axis-Aligned Gaussian Mixture Model (AAGMM) final layer, which learns not only the conditional, but also the joint distribution of the latent space. We apply this method to the domain of semi-supervised image classification by extending FlexMatch with our technique. We find our MoM constraint with the AAGMM layer is able to match the reported FlexMatch accuracy, while also modeling the joint distribution, thereby reducing outlier sensitivity. We also present a preliminary outlier detection strategy based on Mahalanobis distance and discuss future improvements to this strategy. Code is available at: \url{this https URL}
Abstract (translated)
具有线性+软度的最后层区分性深度学习模型的一个问题在于:潜在空间仅预测条件概率$p(Y|X)$,而不能预测完整的联合分布$p(Y,X)$,这就需要采用生成方法。条件概率无法检测到异常值,导致软度网络的异常敏感性。这加剧了过度自信的影响,影响了诸如幻觉、混淆偏见和依赖大数据集等问题。为解决这个问题,我们引入了一种基于方法 of moments(MoM)的新嵌入约束。我们研究了使用多项式时刻的线性组合。此外,我们还使用此嵌入约束训练了一个轴向对齐高斯混合模型(AAGMM)的最终层,该模型不仅学习条件概率,还学习潜在空间的联合分布。我们将这种方法应用于半监督图像分类领域,通过扩展我们的技术实现FlexMatch。我们发现,与AAGMM层一起使用MoM嵌入约束能够匹配报告的FlexMatch准确率,同时建模联合分布,从而降低异常敏感性。我们还提出了一个初步的异常检测策略基于Mahalanobis距离,并讨论了未来改进此策略的可能性。代码可在此处下载:\url{这个链接}
URL
https://arxiv.org/abs/2404.17978