Abstract
Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms on existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Further, our method shows improved performance on three PD-affected real-world applications: crowd counting, fisheye image recognition, and person re-identification. We will release source code, dataset, and models for foster further research.
Abstract (translated)
视点扭曲(PD)导致图像中视觉概念的形状、大小、方向、角度和其他空间关系发生了前所未有的变化。精确估计相机内参和外参是一个具有挑战性的任务,阻碍了合成视点扭曲。缺乏专门的训练数据使开发稳健的计算机视觉方法变得非常困难。此外,扭曲纠正方法使其他计算机视觉任务变得多步级,并且缺乏性能。在本文中,我们通过在特定的一组Möbius变换上采用细粒度参数控制来减轻视点扭曲(MPD),以建模真实世界的扭曲,而无需估计相机内参和外参,也无需实际的扭曲数据。同时,我们还提出了一个专用的视点扭曲基准数据集ImageNet-PD,用于对比深度学习模型与这一新数据集的鲁棒性。与现有基准相比,所提出的方法在ImageNet-E和ImageNet-X上表现优异。此外,它在ImageNet-PD上表现出显著的改善,而始终在标准数据分布上表现稳定。进一步,我们的方法在三个PD受影响的现实应用中表现出改善的性能:人群计数、 Fish Eye图像识别和人物识别。我们将发布源代码、数据集和模型,以促进进一步的研究。
URL
https://arxiv.org/abs/2405.02296