Multi-scale Unified Network for Image Classification

Abstract
Abstract (translated)
URL
PDF

Abstract

Convolutional Neural Networks (CNNs) have advanced significantly in visual representation learning and recognition. However, they face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs. Conventional methods rescale all input images into a fixed size, wherein a larger fixed size favors performance but rescaling small size images to a larger size incurs digitization noise and increased computation cost. In this work, we carry out a comprehensive, layer-wise investigation of CNN models in response to scale variation, based on Centered Kernel Alignment (CKA) analysis. The observations reveal lower layers are more sensitive to input image scale variations than high-level layers. Inspired by this insight, we propose Multi-scale Unified Network (MUSN) consisting of multi-scale subnets, a unified network, and scale-invariant constraint. Our method divides the shallow layers into multi-scale subnets to enable feature extraction from multi-scale inputs, and the low-level features are unified in deep layers for extracting high-level semantic features. A scale-invariant constraint is posed to maintain feature consistency across different scales. Extensive experiments on ImageNet and other scale-diverse datasets, demonstrate that MSUN achieves significant improvements in both model performance and computational efficiency. Particularly, MSUN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios.

Abstract (translated)

卷积神经网络（CNN）在图像表示学习和识别方面取得了显著的进步。然而，当处理真实世界、多尺度图像输入时，它们在性能和计算效率方面面临显著挑战。传统方法将所有输入图像缩放到固定大小，其中较大固定尺寸有利于性能，但将小尺寸图像缩放到较大尺寸会导致量化噪声和增加计算成本。在本文中，我们根据中心卷积对齐（CKA）分析对CNN模型进行了全面的层次调查。观察结果表明，低层对输入图像尺度变化更加敏感，而高层层对输入图像的尺度变化不太敏感。基于这一洞察，我们提出了多尺度统一网络（MUSN）由多尺度子网、统一网络和尺度不变约束组成。我们的方法将浅层层分为多尺度子网，以便从多尺度输入中提取特征，并在深层中统一低级特征以提取高级语义特征。一个尺度不变约束 posed 为保持不同尺度特征的一致性。在ImageNet和其他尺度多样数据集上进行的大量实验证明，MSUN在模型性能和计算效率方面取得了显著的改进。特别是，MSUN在多尺度场景中的准确率提高了44.53%，FLOPs降低了7.01-16.13%。

URL

https://arxiv.org/abs/2403.18294

PDF

https://arxiv.org/pdf/2403.18294.pdf