ATOM: Attention Mixer for Efficient Dataset Distillation

Abstract
Abstract (translated)
URL
PDF

Abstract

Recent works in dataset distillation seek to minimize training expenses by generating a condensed synthetic dataset that encapsulates the information present in a larger real dataset. These approaches ultimately aim to attain test accuracy levels akin to those achieved by models trained on the entirety of the original dataset. Previous studies in feature and distribution matching have achieved significant results without incurring the costs of bi-level optimization in the distillation process. Despite their convincing efficiency, many of these methods suffer from marginal downstream performance improvements, limited distillation of contextual information, and subpar cross-architecture generalization. To address these challenges in dataset distillation, we propose the ATtentiOn Mixer (ATOM) module to efficiently distill large datasets using a mixture of channel and spatial-wise attention in the feature matching process. Spatial-wise attention helps guide the learning process based on consistent localization of classes in their respective images, allowing for distillation from a broader receptive field. Meanwhile, channel-wise attention captures the contextual information associated with the class itself, thus making the synthetic image more informative for training. By integrating both types of attention, our ATOM module demonstrates superior performance across various computer vision datasets, including CIFAR10/100 and TinyImagenet. Notably, our method significantly improves performance in scenarios with a low number of images per class, thereby enhancing its potential. Furthermore, we maintain the improvement in cross-architectures and applications such as neural architecture search.

Abstract (translated)

近年来在数据蒸馏领域的研究旨在通过生成一个压缩合成数据集来最小化训练成本，该数据集包含了较大真实数据集中的信息。这些方法最终旨在实现与整个原始数据集训练出的模型具有相似的测试准确度。之前在特征匹配和分布匹配方面的研究表明，在蒸馏过程中没有产生双层优化成本，但取得了显著的成果。尽管这些方法在节省训练成本方面具有令人满意的效率，但它们在下游性能方面存在微小的改进，对上下文信息的提取有限，并且模型扩展性较差。为了应对这些挑战，我们在数据蒸馏领域提出了ATtentiOn Mixer（ATOM）模块，利用混合通道和空间注意在特征匹配过程中高效地蒸馏大型数据集。空间注意可以帮助根据类在各自图像上的一致定位来指导学习过程，实现从更广泛的感受野进行蒸馏。同时，通道注意可以捕捉类本身相关的上下文信息，从而使合成图像对训练更加有用。通过整合这两种注意，我们的ATOM模块在各种计算机视觉数据集上的表现都超过了之前的水平，包括CIFAR10/100和TinyImagenet。值得注意的是，我们的方法在图像数量较低的情况下显著提高了性能，从而增强了其潜力。此外，我们还保持了在神经架构搜索等方面的改进。

URL

https://arxiv.org/abs/2405.01373

PDF

https://arxiv.org/pdf/2405.01373.pdf

ATOM: Attention Mixer for Efficient Dataset Distillation

Abstract

Abstract (translated)

URL

PDF Copy

PDF