Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper introduces a novel framework for zero-shot learning (ZSL), i.e., to recognize new categories that are unseen during training, by using a multi-model and multi-alignment integration method. Specifically, we propose three strategies to enhance the model's performance to handle ZSL: 1) Utilizing the extensive knowledge of ChatGPT and the powerful image generation capabilities of DALL-E to create reference images that can precisely describe unseen categories and classification boundaries, thereby alleviating the information bottleneck issue; 2) Integrating the results of text-image alignment and image-image alignment from CLIP, along with the image-image alignment results from DINO, to achieve more accurate predictions; 3) Introducing an adaptive weighting mechanism based on confidence levels to aggregate the outcomes from different prediction methods. Experimental results on multiple datasets, including CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate that our model can significantly improve classification accuracy compared to single-model approaches, achieving AUROC scores above 96% across all test datasets, and notably surpassing 99% on the CIFAR-10 dataset.

Abstract (translated)

本文提出了一种新颖的零样本学习（ZSL）框架，即利用多模型和多对齐整合方法，在训练过程中识别未见过的类别。具体来说，我们提出了三种策略来提高模型的性能以处理ZSL：1）利用ChatGPT的广泛知识以及DALL-E强大的图像生成能力，生成可以精确描述未见过的类别的参考图像，从而减轻信息瓶颈问题；2）将CLIP中的文本图像对齐和图像图像对齐的结果，与DINO中的图像图像对齐结果相结合，以实现更准确的预测；3）引入基于置信度级别的自适应加权机制，以汇总不同预测方法的结果。在多个数据集上的实验结果，包括CIFAR-10、CIFAR-100和TinyImageNet，表明我们的模型可以在与单模型方法相比显著提高分类精度，实现所有测试数据集的AUROC分数都超过96%，并且在CIFAR-10数据集上更是超过了99%。

URL

https://arxiv.org/abs/2405.02155

PDF

https://arxiv.org/pdf/2405.02155.pdf

Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification

Abstract

Abstract (translated)

URL

PDF Copy

PDF