Abstract
Detection of interacting and conversational groups from images has applications in video surveillance and social robotics. In this paper we build on prior attempts to find conversational groups by detection of social gathering spaces called o-spaces used to assign people to groups. As our contributions to the task, we are the first paper to incorporate features extracted from the room layout image, and the first to incorporate a deep network to generate an image representation of the proposed o-spaces. Specifically, this novel network builds on the PointNet architecture which allows unordered inputs of variable sizes. We present accuracies which demonstrate the ability to rival and sometimes outperform the best models, but due to a data imbalance issue we do not yet outperform existing models in our test results.
Abstract (translated)
从图像中检测交互组和会话组在视频监控和社会机器人学中有应用。本文建立在先前尝试的基础上,通过检测社会聚集空间来发现会话组,称为O-空间,用于将人分配到组中。作为我们对这项任务的贡献,我们是第一篇将从房间布局图像中提取的特征合并起来的论文,也是第一篇将深度网络合并起来生成所提议的O空间的图像表示的论文。具体地说,这个新的网络建立在Pointnet体系结构之上,它允许不同大小的无序输入。我们提供的精确度表明了与最佳模型竞争的能力,有时优于最佳模型,但由于数据不平衡问题,我们的测试结果还没有优于现有模型。
URL
https://arxiv.org/abs/1810.04039