Abstract
Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
Abstract (translated)
发展健壮且可解释的视觉系统是实现可信人工智能的重要一步。在这方面,一个有前景的范式考虑将任务所需的不变结构(例如几何不变)嵌入基本图像表示中。然而,这样的不变表示通常表现出有限的判别能力,限制了其在大型可信视觉任务中的应用。针对这个问题,我们进行了系统性的研究,从理论、实践和应用角度探讨了层次不变性。在理论层面上,我们证明了通过类似于卷积神经网络(CNN)的层次结构构建自监督类全局不变量(GUV)且在完全可解释的方式下构建。提供了总体的描述、具体的定义、不变性质和数值实现。在实践层面上,我们讨论了如何将这个理论框架定制到给定的任务上。在层次不变性的情况下,可以以类似于神经架构搜索(NAS)的方式动态地形成与任务相关的判别特征。我们在纹理、数字和寄生虫分类实验中证明了上述论点的准确度、不变性和效率。此外,在应用层面上,我们的表示在现实世界的法医取证任务中研究了对抗扰动和人工智能生成内容(AIGC)。这些应用表明,与传统的CNN和不变量相比,所提出的策略不仅实现了理论上的承诺的不变性,而且在深度学习时代也表现出了竞争力的判别能力。对于大型可信视觉任务,层次不变表示可以被视为传统CNN和不变量的有效替代方案。
URL
https://arxiv.org/abs/2402.15430