Abstract
The generalization of neural networks is a central challenge in machine learning, especially concerning the performance under distributions that differ from training ones. Current methods, mainly based on the data-driven paradigm such as data augmentation, adversarial training, and noise injection, may encounter limited generalization due to model non-smoothness. In this paper, we propose to investigate generalization from a Partial Differential Equation (PDE) perspective, aiming to enhance it directly through the underlying function of neural networks, rather than focusing on adjusting input data. Specifically, we first establish the connection between neural network generalization and the smoothness of the solution to a specific PDE, namely ``transport equation''. Building upon this, we propose a general framework that introduces adaptive distributional diffusion into transport equation to enhance the smoothness of its solution, thereby improving generalization. In the context of neural networks, we put this theoretical framework into practice as PDE+ (\textbf{PDE} with \textbf{A}daptive \textbf{D}istributional \textbf{D}iffusion) which diffuses each sample into a distribution covering semantically similar inputs. This enables better coverage of potentially unobserved distributions in training, thus improving generalization beyond merely data-driven methods. The effectiveness of PDE+ is validated in extensive settings, including clean samples and various corruptions, demonstrating its superior performance compared to SOTA methods.
Abstract (translated)
神经网络的泛化是机器学习中的核心挑战,特别是关于训练数据和分布之间的性能。目前的方法,主要是基于数据驱动范式,例如数据增强、对抗训练和噪声注入,可能会因为模型的不平滑而遇到有限的泛化能力。在本文中,我们提议从偏微分方程(PDE)的角度研究泛化问题,旨在通过神经网络的基函数增强其 underlying 函数的平滑性,而不是仅仅关注调整输入数据。具体而言,我们首先建立了神经网络泛化与特定PDE解决方案平滑性的联系,即“传输方程”。基于这一点,我们提出了一个通用框架,将自适应分布扩散引入传输方程,以增强其解决方案的平滑性,从而改善泛化能力。在神经网络的背景下,我们将这个理论框架应用于实践,将其称为PDE+,(PDE with Adaptive Distributional Diffusion),将每个样本扩散到覆盖语义上相似的输入的分布中。这使能够在训练过程中更好地覆盖可能存在未观测到的分布,从而超越了仅仅基于数据驱动方法的泛化能力。PDE+的效果在广泛的设置中得到了验证,包括干净样本和各种欺诈,证明了它与SOTA方法相比的优越性能。
URL
https://arxiv.org/abs/2305.15835