Abstract
Human decision-making often relies on visual information from multiple perspectives or views. In contrast, machine learning-based object recognition utilizes information from a single image of the object. However, the information conveyed by a single image may not be sufficient for accurate decision-making, particularly in complex recognition problems. The utilization of multi-view 3D representations for object recognition has thus far demonstrated the most promising results for achieving state-of-the-art performance. This review paper comprehensively covers recent progress in multi-view 3D object recognition methods for 3D classification and retrieval tasks. Specifically, we focus on deep learning-based and transformer-based techniques, as they are widely utilized and have achieved state-of-the-art performance. We provide detailed information about existing deep learning-based and transformer-based multi-view 3D object recognition models, including the most commonly used 3D datasets, camera configurations and number of views, view selection strategies, pre-trained CNN architectures, fusion strategies, and recognition performance on 3D classification and 3D retrieval tasks. Additionally, we examine various computer vision applications that use multi-view classification. Finally, we highlight key findings and future directions for developing multi-view 3D object recognition methods to provide readers with a comprehensive understanding of the field.
Abstract (translated)
人类决策通常依赖于来自多个视角或视图的视觉信息。相比之下,基于机器学习的物体识别利用了一个物体的单张图像中的信息。然而,单个图像中传递的信息可能不足以实现准确的决策,尤其是在复杂识别问题中。因此,多视角 3D 表示用于物体识别已经证明为实现最先进的性能提供了最有前途的结果。 本文回顾了多视角 3D 物体识别方法在 3D 分化和检索任务中的最新进展。具体来说,我们关注基于深度学习和Transformer 的技术,因为它们得到了广泛应用并取得了最先进的成绩。我们提供了关于现有基于深度学习和Transformer 的多视角 3D 物体识别模型的详细信息,包括最常用的 3D 数据集、相机配置和视角数量、视角选择策略、预训练 CNN 架构、融合策略以及关于分类和检索任务的识别性能。此外,我们研究了各种使用多视角分类的计算机视觉应用。最后,我们重点关注了在开发多视角 3D 物体识别方法方面的一些关键发现和未来方向,以提供读者全面的了解该领域的理解。
URL
https://arxiv.org/abs/2404.15224