Abstract
We study a deep linear network endowed with a structure. It takes the form of a matrix $X$ obtained by multiplying $K$ matrices (called factors and corresponding to the action of the layers). The action of each layer (i.e. a factor) is obtained by applying a fixed linear operator to a vector of parameters satisfying a constraint. The number of layers is not limited. Assuming that $X$ is given and factors have been estimated, the error between the product of the estimated factors and $X$ (i.e. the reconstruction error) is either the statistical or the empirical risk. In this paper, we provide necessary and sufficient conditions on the network topology under which a stability property holds. The stability property requires that the error on the parameters defining the factors (i.e. the stability of the recovered parameters) scales linearly with the reconstruction error (i.e. the risk). Therefore, under these conditions on the network topology, any successful learning task leads to stably defined features and therefore interpretable layers/network.In order to do so, we first evaluate how the Segre embedding and its inverse distort distances. Then, we show that any deep structured linear network can be cast as a generic multilinear problem (that uses the Segre embedding). This is the {\em tensorial lifting}. Using the tensorial lifting, we provide necessary and sufficient conditions for the identifiability of the factors (up to a scale rearrangement). We finally provide the necessary and sufficient condition called \NSPlong~(because of the analogy with the usual Null Space Property in the compressed sensing framework) which guarantees that the stability property holds. We illustrate the theory with a practical example where the deep structured linear network is a convolutional linear network. As expected, the conditions are rather strong but not empty. A simple test on the network topology can be implemented to test if the condition holds.
Abstract (translated)
我们研究了一个具有结构的深度学习线性网络。它的形式是 $K$ 个矩阵的乘积得到的矩阵 $X$,即称为 factors 并对应每个层的动作。每个层的动作(即一个 factors)是通过应用一个固定的线性操作对满足约束的参数向量进行计算得到的。层的数量不受限制。假设 $X$ 已经给定,并已经估计了 factors,则估计的 factors 与 $X$ 的乘积之差(即重建误差)可能是统计风险或经验风险。在本文中,我们提供了在网络拓扑下保持稳定性的必要条件和充分条件。稳定性性质要求定义 factors 的参数误差(即恢复参数的稳定性)以线性方式与重建误差(即风险)成正比。因此,在这些网络拓扑下,任何成功的学习任务都会导致稳定定义的特征,因此可解释的层/网络。为了这样做,我们首先评估了史格雷嵌入和它的逆对距离的影响。然后,我们表明,任何具有深刻结构深度学习线性网络都可以转换为一个通用多线性问题(使用史格雷嵌入)。这就是 Tensorial lifting。使用 Tensorial lifting,我们提供了 factors 的可确定性的必要条件和充分条件(到规模重构)。最后,我们提供了称为 NSPlong~(因为与压缩感知框架中的常见 null 空间 property 的类比)的必要和充分条件,以确保稳定性性质成立。我们使用实际示例来阐明理论,其中深刻的结构深度学习线性网络是一个卷积线性网络。正如预期的那样,条件相当强,但不是空的。可以通过网络拓扑的简单测试来测试条件是否成立。
URL
https://arxiv.org/abs/1703.08044