Abstract
Zero-shot learning has consistently yielded remarkable progress via modeling nuanced one-to-one visual-attribute correlation. Existing studies resort to refining a uniform mapping function to align and correlate the sample regions and subattributes, ignoring two crucial issues: 1) the inherent asymmetry of attributes; and 2) the unutilized channel information. This paper addresses these issues by introducing a simple yet effective approach, dubbed Dual Expert Distillation Network (DEDN), where two experts are dedicated to coarse- and fine-grained visual-attribute modeling, respectively. Concretely, one coarse expert, namely cExp, has a complete perceptual scope to coordinate visual-attribute similarity metrics across dimensions, and moreover, another fine expert, namely fExp, consists of multiple specialized subnetworks, each corresponds to an exclusive set of attributes. Two experts cooperatively distill from each other to reach a mutual agreement during training. Meanwhile, we further equip DEDN with a newly designed backbone network, i.e., Dual Attention Network (DAN), which incorporates both region and channel attention information to fully exploit and leverage visual semantic knowledge. Experiments on various benchmark datasets indicate a new state-of-the-art.
Abstract (translated)
零样本学习通过建模复杂的一对一视觉属性相关性 consistently取得了显著的进步。现有研究通过优化统一映射函数来对样本区域和子属性进行对齐和相关,忽略了两个关键问题:(1)属性的固有不对称性;(2)未利用的通道信息。本文通过引入一种简单而有效的途径来解决这些问题,称为双专家蒸馏网络(DEDN),其中两个专家分别致力于粗粒度和细粒度视觉属性建模。具体来说,一个粗专家,即cExp,具有完整的感知范围,以协调维度内的视觉属性相似度度量,另一个细专家,即fExp,由多个专用子网络组成,每个子网络对应一个独特的属性集合。两个专家在训练过程中合作蒸馏,以达到相互一致。同时,我们通过设计一个新的骨干网络,即双注意网络(DAN),为DEDN添加了新功能,该网络包含区域和通道关注信息,以充分利用和利用视觉语义知识。在各种基准数据集上的实验表明,达到了最先进水平。
URL
https://arxiv.org/abs/2404.16348