Paper Reading AI Learner

Deep Models for Multi-View 3D Object Recognition: A Review

2024-04-23 16:54:31
Mona Alzahrani, Muhammad Usman, Salma Kammoun, Saeed Anwar, Tarek Helmy

Abstract

Human decision-making often relies on visual information from multiple perspectives or views. In contrast, machine learning-based object recognition utilizes information from a single image of the object. However, the information conveyed by a single image may not be sufficient for accurate decision-making, particularly in complex recognition problems. The utilization of multi-view 3D representations for object recognition has thus far demonstrated the most promising results for achieving state-of-the-art performance. This review paper comprehensively covers recent progress in multi-view 3D object recognition methods for 3D classification and retrieval tasks. Specifically, we focus on deep learning-based and transformer-based techniques, as they are widely utilized and have achieved state-of-the-art performance. We provide detailed information about existing deep learning-based and transformer-based multi-view 3D object recognition models, including the most commonly used 3D datasets, camera configurations and number of views, view selection strategies, pre-trained CNN architectures, fusion strategies, and recognition performance on 3D classification and 3D retrieval tasks. Additionally, we examine various computer vision applications that use multi-view classification. Finally, we highlight key findings and future directions for developing multi-view 3D object recognition methods to provide readers with a comprehensive understanding of the field.

Abstract (translated)

人类决策通常依赖于来自多个视角或视图的视觉信息。相比之下,基于机器学习的物体识别利用了一个物体的单张图像中的信息。然而,单个图像中传递的信息可能不足以实现准确的决策,尤其是在复杂识别问题中。因此,多视角 3D 表示用于物体识别已经证明为实现最先进的性能提供了最有前途的结果。 本文回顾了多视角 3D 物体识别方法在 3D 分化和检索任务中的最新进展。具体来说,我们关注基于深度学习和Transformer 的技术,因为它们得到了广泛应用并取得了最先进的成绩。我们提供了关于现有基于深度学习和Transformer 的多视角 3D 物体识别模型的详细信息,包括最常用的 3D 数据集、相机配置和视角数量、视角选择策略、预训练 CNN 架构、融合策略以及关于分类和检索任务的识别性能。此外,我们研究了各种使用多视角分类的计算机视觉应用。最后,我们重点关注了在开发多视角 3D 物体识别方法方面的一些关键发现和未来方向,以提供读者全面的了解该领域的理解。

URL

https://arxiv.org/abs/2404.15224

PDF

https://arxiv.org/pdf/2404.15224.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot