Paper Reading AI Learner

Conversational Group Detection With Deep Convolutional Networks

2019-05-05 16:52:03
Mason Swofford, John Peruzzi, Marynel Vázquez

Abstract

Detection of interacting and conversational groups from images has applications in video surveillance and social robotics. In this paper we build on prior attempts to find conversational groups by detection of social gathering spaces called o-spaces used to assign people to groups. As our contributions to the task, we are the first paper to incorporate features extracted from the room layout image, and the first to incorporate a deep network to generate an image representation of the proposed o-spaces. Specifically, this novel network builds on the PointNet architecture which allows unordered inputs of variable sizes. We present accuracies which demonstrate the ability to rival and sometimes outperform the best models, but due to a data imbalance issue we do not yet outperform existing models in our test results.

Abstract (translated)

从图像中检测交互组和会话组在视频监控和社会机器人学中有应用。本文建立在先前尝试的基础上,通过检测社会聚集空间来发现会话组,称为O-空间,用于将人分配到组中。作为我们对这项任务的贡献,我们是第一篇将从房间布局图像中提取的特征合并起来的论文,也是第一篇将深度网络合并起来生成所提议的O空间的图像表示的论文。具体地说,这个新的网络建立在Pointnet体系结构之上,它允许不同大小的无序输入。我们提供的精确度表明了与最佳模型竞争的能力,有时优于最佳模型,但由于数据不平衡问题,我们的测试结果还没有优于现有模型。

URL

https://arxiv.org/abs/1810.04039

PDF

https://arxiv.org/pdf/1810.04039.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot