Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors

Abstract
Abstract (translated)
URL
PDF

Abstract

The aim of this work is to establish how accurately a recent semantic-based foveal active perception model is able to complete visual tasks that are regularly performed by humans, namely, scene exploration and visual search. This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations. It has been used previously in scene exploration tasks. In this paper, we revisit the model and extend its application to visual search tasks. To illustrate the benefits of using semantic information in scene exploration and visual search tasks, we compare its performance against traditional saliency-based models. In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model in accurately representing the semantic information present in the visual scene. In visual search experiments, searching for instances of a target class in a visual field containing multiple distractors shows superior performance compared to the saliency-driven model and a random gaze selection algorithm. Our results demonstrate that semantic information, from the top-down, influences visual exploration and search tasks significantly, suggesting a potential area of research for integrating it with traditional bottom-up cues.

Abstract (translated)

本文旨在探讨一个最近基于语义的信息提取的视野主动感知模型的准确性和其在完成人类通常执行的视觉任务（场景探索和视觉搜索）方面的能力。该模型利用当前物体检测器定位和分类大量物体类别的功能，并在多个注视点上更新场景的语义描述。它在之前用于场景探索任务中已经应用过。在本文中，我们重新审视了该模型，并将其应用于视觉搜索任务。为了说明在场景探索和视觉搜索任务中使用语义信息的优势，我们将其性能与传统基于 saliency 的模型进行比较。在场景探索任务中，基于语义的方法在准确表示视觉场景中的语义信息方面表现出优越性能。在视觉搜索实验中，在包含多个干扰物的视觉区域内搜索目标类别的实例，与基于 saliency 的模型和随机注视选择算法相比，表现出优越性能。我们的结果表明，从上到下，语义信息对视觉探索和搜索任务具有显著影响，这表明了一个可能的研究领域，将语义信息与传统自上而下提示相结合。

URL

https://arxiv.org/abs/2404.10836

PDF

https://arxiv.org/pdf/2404.10836.pdf

Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors

Abstract

Abstract (translated)

URL

PDF Copy

PDF