Paper Reading AI Learner

Collision Cross-entropy and EM Algorithm for Self-labeled Classification

2023-03-16 23:01:54
Zhongwen Zhang, Yuri Boykov

Abstract

We propose "collision cross-entropy" as a robust alternative to the Shannon's cross-entropy in the context of self-labeled classification with posterior models. Assuming unlabeled data, self-labeling works by estimating latent pseudo-labels, categorical distributions y, that optimize some discriminative clustering criteria, e.g. "decisiveness" and "fairness". All existing self-labeled losses incorporate Shannon's cross-entropy term targeting the model prediction, softmax, at the estimated distribution y. In fact, softmax is trained to mimic the uncertainty in y exactly. Instead, we propose the negative log-likelihood of "collision" to maximize the probability of equality between two random variables represented by distributions softmax and y. We show that our loss satisfies some properties of a generalized cross-entropy. Interestingly, it agrees with the Shannon's cross-entropy for one-hot pseudo-labels y, but the training from softer labels weakens. For example, if y is a uniform distribution at some data point, it has zero contribution to the training. Our self-labeling loss combining collision cross entropy with basic clustering criteria is convex w.r.t. pseudo-labels, but non-trivial to optimize over the probability simplex. We derive a practical EM algorithm optimizing pseudo-labels y significantly faster than generic methods, e.g. the projectile gradient descent. The collision cross-entropy consistently improves the results on multiple self-labeled clustering examples using different DNNs.

Abstract (translated)

我们提出了“碰撞交叉熵”作为稳健 alternative 以 posterior 模型自我标注分类为背景的 Shannon 交叉熵。假设未标记数据,自我标注通过估计潜在的伪标签 y,分类类别分布,来优化一些特定的分类聚类标准,例如“决定性”和“公平性”。所有现有的自我标注损失都包括 Shannon 交叉熵 term 以针对估计分布 y 的目标模型预测,softmax 计算。实际上,softmax 训练旨在模拟 y 中的不确定度。相反,我们提议“碰撞”负对数似然率来最大化由分布 softmax 和 y 代表的两个随机变量之间的相等概率。我们表明,我们的损失满足一些通用交叉熵的特性。有趣的是,它对 y 的一Hot伪标签的 Shannon 交叉熵一致同意,但较软标签的训练削弱了。例如,如果 y 在某种数据点是一个均匀分布,它没有对训练的贡献。我们的自我标注损失将碰撞交叉熵与基本分类聚类标准结合,是凸的对伪标签而言,但优化概率简谐平面的难度很大。我们推导了一个实用的 EMD 算法优化伪标签 y 的速度比通用方法,例如 Projectile Gradient Descent 更快。碰撞交叉熵使用不同的 DNN 对多个自我标注聚类示例 consistently 提高了结果。

URL

https://arxiv.org/abs/2303.07321

PDF

https://arxiv.org/pdf/2303.07321.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot