Abstract
Rooting in the scarcity of most attributes, realistic pedestrian attribute datasets exhibit unduly skewed data distribution, from which two types of model failures are delivered: (1) label imbalance: model predictions lean greatly towards the side of majority labels; (2) semantics imbalance: model is easily overfitted on the under-represented attributes due to their insufficient semantic diversity. To render perfect label balancing, we propose a novel framework that successfully decouples label-balanced data re-sampling from the curse of attributes co-occurrence, i.e., we equalize the sampling prior of an attribute while not biasing that of the co-occurred others. To diversify the attributes semantics and mitigate the feature noise, we propose a Bayesian feature augmentation method to introduce true in-distribution novelty. Handling both imbalances jointly, our work achieves best accuracy on various popular benchmarks, and importantly, with minimal computational budget.
Abstract (translated)
翻译:在大多数属性的稀缺性上,现实步行属性数据集表现出不健康的数据分布,其中两种模型故障被交付:(1)标签不平衡:模型预测极大地倾向于多数标签的一侧;(2)语义不平衡:由于它们语义多样性的不足,模型很容易过拟合在代表性不足的属性上。为了实现完美的标签平衡,我们提出了一个新框架,该框架成功地将标签平衡数据重新采样与属性共现的诅咒解耦,即在平衡一个属性的采样优先级时,不偏袒其他共现属性的采样优先级。为了丰富属性的语义并减轻特征噪声,我们提出了一个贝叶斯特征增强方法来引入真正的分布新奇。共同处理两种不平衡,我们的工作在各种流行基准上实现了最佳准确度,并且重要的是,具有最小的计算开销。
URL
https://arxiv.org/abs/2405.04858