Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

2022-04-14 16:31:24

Michele Mazzamuto, Francesco Ragusa, Antonino Furnari, Giovanni Signorello, Giovanni Maria Farinella

arXiv_CV

Abstract
Abstract (translated)
URL
PDF

Abstract

We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: this https URL

Abstract (translated)

URL

https://arxiv.org/abs/2204.07090

PDF

https://arxiv.org/pdf/2204.07090.pdf