Paper Reading AI Learner

Temporal Unet: Sample Level Human Action Recognition using WiFi

2019-04-19 21:23:28
Fei Wang, Yunpeng Song, Jimuyang Zhang, Jinsong Han, Dong Huang

Abstract

Human doing actions will result in WiFi distortion, which is widely explored for action recognition, such as the elderly fallen detection, hand sign language recognition, and keystroke estimation. As our best survey, past work recognizes human action by categorizing one complete distortion series into one action, which we term as series-level action recognition. In this paper, we introduce a much more fine-grained and challenging action recognition task into WiFi sensing domain, i.e., sample-level action recognition. In this task, every WiFi distortion sample in the whole series should be categorized into one action, which is a critical technique in precise action localization, continuous action segmentation, and real-time action recognition. To achieve WiFi-based sample-level action recognition, we fully analyze approaches in image-based semantic segmentation as well as in video-based frame-level action recognition, then propose a simple yet efficient deep convolutional neural network, i.e., Temporal Unet. Experimental results show that Temporal Unet achieves this novel task well. Codes have been made publicly available at https://github.com/geekfeiw/WiSLAR.

Abstract (translated)

人的行为会导致wifi失真,这在动作识别中得到了广泛的探索,如老年人跌倒检测、手势语言识别、击键估计等。作为我们最好的调查,过去的工作通过将一个完整的扭曲序列分类为一个动作来识别人类行为,我们称之为系列级动作识别。在本文中,我们将一个更加精细和具有挑战性的动作识别任务引入到WiFi感知领域,即样本级的动作识别。在这项任务中,整个系列中的每一个WiFi失真样本都应分为一个动作,这是精确动作定位、连续动作分割和实时动作识别的关键技术。为了实现基于wifi的样本级动作识别,我们对基于图像的语义分割和基于视频的帧级动作识别方法进行了全面的分析,提出了一种简单而有效的深度卷积神经网络,即时间UNET。实验结果表明,时间UNET能很好地完成这项新任务。代码已在https://github.com/geekfeiw/wislar公开发布。

URL

https://arxiv.org/abs/1904.11953

PDF

https://arxiv.org/pdf/1904.11953.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot