Paper Reading AI Learner

Object Localization with a Weakly Supervised CapsNet

2019-03-26 00:21:16
Weitang Liu, Emad Barsoum, John D. Owens

Abstract

Inspired by CapsNet's routing-by-agreement mechanism with its ability to learn object properties, we propose a CapsNet architecture with object coordinate atoms and a modified routing-by-agreement algorithm with unevenly distributed initial routing probabilities. The model is based on CapsNet but uses a routing algorithm to find the objects' approximate positions in the image coordinate system. We also discussed how to derive the property of translation through coordinate atoms and we discover the importance of sparse representation. We train our model on the single moving MNIST dataset with class labels. Our model can learn and derive the coordinates of the digits better than its convolution counterpart that lacks a routing-by-agreement algorithm, and can also perform well when testing on the multi-digit moving MNIST datasets. When deriving the coordinates, our model performs at least 13\%, 24\%, and 51\% better than the convNet counterpart and ResNet 20 benchmarks on 1-digit, 2-digit, and 3-digit moving MNIST datasets. This shows our method has better transfer learning properties on unseen scenarios of the new but related datasets. We also achieve slightly better performance than the ResNet benchmark in the KTH dataset; these results show our method reaches the state-of-art performance on object localization without any extra localization techniques and modules as in prior work.

Abstract (translated)

受capsnet协议路由机制及其学习对象属性能力的启发,提出了一种具有目标坐标原子的capsnet结构,并提出了一种初始路由概率分布不均的协议路由算法。该模型以capsnet为基础,采用路由算法求出目标在图像坐标系中的近似位置。我们还讨论了如何通过坐标原子来推导平移的性质,并发现了稀疏表示的重要性。我们在带有类标签的单个移动mnist数据集上训练我们的模型。我们的模型能够比没有协议路由算法的卷积模型更好地学习和推导数字的坐标,并且在对移动的多位数mnist数据集进行测试时也能很好地执行。在导出坐标时,我们的模型在1位、2位和3位移动mnist数据集上的性能至少比convnet对应物和resnet 20基准高13%、24%和51%。这表明我们的方法在新的但相关的数据集的未公开场景中具有更好的传输学习属性。我们还比KTH数据集中的Resnet基准测试获得了稍好的性能;这些结果表明,我们的方法在对象本地化方面达到了最先进的性能,没有任何额外的本地化技术和模块,如之前的工作中所述。

URL

https://arxiv.org/abs/1805.07706

PDF

https://arxiv.org/pdf/1805.07706.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot