Abstract
Interest point descriptors have fueled progress on almost every problem in computer vision. Recent advances in deep neural networks have enabled task-specific learned descriptors that outperform hand-crafted descriptors on many problems. We demonstrate that commonly used metric learning approaches do not optimally leverage the feature hierarchies learned in a Convolutional Neural Network (CNN), especially when applied to the task of geometric feature matching. While a metric loss applied to the deepest layer of a CNN, is often expected to yield ideal features irrespective of the task, in fact the growing receptive field as well as striding effects cause shallower features to be better at high precision matching tasks. We leverage this insight together with explicit supervision at multiple levels of the feature hierarchy for better regularization, to learn more effective descriptors in the context of geometric matching tasks. Further, we propose to use activation maps at different layers of a CNN, as an effective and principled replacement for the multi-resolution image pyramids often used for matching tasks. We propose concrete CNN architectures employing these ideas, and evaluate them on multiple datasets for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across datasets.
Abstract (translated)
兴趣点描述符推动了计算机视觉中几乎所有问题的进展。深度神经网络的最新进展使得任务特定的学习描述符在许多问题上优于手工描述符。我们证明了常用的度量学习方法没有最佳地利用在卷积神经网络(CNN)中学习的特征层次结构,特别是当应用于几何特征匹配的任务时。虽然应用于CNN最深层的度量损失通常预期会产生理想的特征,而不管任务如何,实际上不断增长的感知场以及跨越效应导致较浅的特征在高精度匹配任务中更好。我们将这种洞察力与特征层次结构的多个层面的明确监督结合起来,以便更好地进行正则化,在几何匹配任务的上下文中学习更有效的描述符。此外,我们建议在CNN的不同层使用激活图,作为经常用于匹配任务的多分辨率图像金字塔的有效且有原则的替代。我们提出了采用这些想法的具体CNN架构,并在多个数据集上对2D和3D几何匹配以及光流进行评估,展示了最先进的结果和跨数据集的一般化。
URL
https://arxiv.org/abs/1803.07231