Paper Reading AI Learner

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

2018-04-30 16:57:00
Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

Abstract

In Actor and Observer we introduced a dataset linking the first and third-person video understanding domains, the Charades-Ego Dataset. In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68,536 activity instances in 68.8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available. Charades-Ego furthermore shares activity classes, scripts, and methodology with the Charades dataset, that consist of additional 82.3 hours of third-person video with 66,500 activity instances. Charades-Ego has temporal annotations and textual descriptions, making it suitable for egocentric video classification, localization, captioning, and new tasks utilizing the cross-modal nature of the data.

Abstract (translated)

在Actor和Observer中,我们引入了一个连接第一人和第三人视频理解域的数据集Charades-Ego Dataset。在本文中,我们描述了数据集的以自我为中心的方面,并在第一和第三人视频的68.8小时内为Charades-Ego提供了68,536个活动实例的注释,使其成为可用的最大和最多样化的以自我为中心的数据集之一。 Charades-Ego还与Charades数据集共享活动类别,脚本和方法,其中包括82.3小时的第三方视频和66,500个活动实例。 Charades-Ego具有时间注释和文本描述,适合以自我为中心的视频分类,本地化,字幕以及利用数据的跨模式性质的新任务。

URL

https://arxiv.org/abs/1804.09626

PDF

https://arxiv.org/pdf/1804.09626.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot