Paper Reading AI Learner

Möbius Transform for Mitigating Perspective Distortions in Representation Learning

2024-03-07 15:39:00
Prakash Chandra Chhipa, Meenakshi Subhash Chippa, Kanjar De, Rajkumar Saini, Marcus Liwicki, Mubarak Shah

Abstract

Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms on existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Further, our method shows improved performance on three PD-affected real-world applications: crowd counting, fisheye image recognition, and person re-identification. We will release source code, dataset, and models for foster further research.

Abstract (translated)

视点扭曲(PD)导致图像中视觉概念的形状、大小、方向、角度和其他空间关系发生了前所未有的变化。精确估计相机内参和外参是一个具有挑战性的任务,阻碍了合成视点扭曲。缺乏专门的训练数据使开发稳健的计算机视觉方法变得非常困难。此外,扭曲纠正方法使其他计算机视觉任务变得多步级,并且缺乏性能。在本文中,我们通过在特定的一组Möbius变换上采用细粒度参数控制来减轻视点扭曲(MPD),以建模真实世界的扭曲,而无需估计相机内参和外参,也无需实际的扭曲数据。同时,我们还提出了一个专用的视点扭曲基准数据集ImageNet-PD,用于对比深度学习模型与这一新数据集的鲁棒性。与现有基准相比,所提出的方法在ImageNet-E和ImageNet-X上表现优异。此外,它在ImageNet-PD上表现出显著的改善,而始终在标准数据分布上表现稳定。进一步,我们的方法在三个PD受影响的现实应用中表现出改善的性能:人群计数、 Fish Eye图像识别和人物识别。我们将发布源代码、数据集和模型,以促进进一步的研究。

URL

https://arxiv.org/abs/2405.02296

PDF

https://arxiv.org/pdf/2405.02296.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot