Paper Reading AI Learner

Joint Learning of Discriminative Low-dimensional Image Representations Based on Dictionary Learning and Two-layer Orthogonal Projections

2019-03-24 12:06:49
Xian Wei, Hao Shen, Yuanxiang Li, Xuan Tang, Bo Jin, Lijun Zhao, Yi Lu Murphey

Abstract

This work investigates the problem of efficiently learning discriminative low-dimensional representations of multiclass large-scale image objects. We propose a generic deep learning approach by taking advantages of Convolutional Neural Networks (CNN), sparse dictionary learning, and orthogonal projections. CNN is not only powerful on feature extraction, but also robust to spatial variance and changes. Sparse dictionary learning is well known for disentangling nonlinear underlying discriminative factors in data. The orthogonal projection is a notable efficient tool to project multi-class data onto low dimensional discriminative subspace. The proposed procedure can be summarized as follows. At first, a CNN is employed to extract high-dimensional (HD) preliminary convolutional features. Secondly, to avoid the high computational cost of direct sparse coding on HD CNN features, we present to learn sparse representation (SR) in an orthogonal projected space over a taskdriven sparsifying dictionary. We then exploit the discriminative projection on SR. The whole learning process is treated as a joint optimization problem of trace quotient maximization, which involves the CNN parameters, the orthogonal projection on CNN features, the dictionary and the discriminative projection on sparse codes. The related cost function is well defined on product manifold of the Stiefel manifold, the Oblique manifold, and the Grassmann manifold. It is optimized via a geometrical stochastic gradient descent algorithm. Finally, the quality of dictionary and projections learned by the proposed approach is further investigated in terms of image classification. The experimental results show that the approach can achieve a strongly competitive performance with state-of-the-art image classification methods.

Abstract (translated)

本文研究了如何有效地学习多类大尺度图像对象的低维识别表示问题。利用卷积神经网络(CNN)、稀疏字典学习和正交投影等优点,提出了一种通用的深度学习方法。CNN不仅具有强大的特征提取能力,而且对空间变异和变化具有鲁棒性。稀疏字典学习以分离数据中的非线性潜在识别因素而闻名。正交投影是将多类数据投影到低维识别子空间的一种有效工具。建议的程序总结如下。首先,使用CNN提取高维(HD)初步卷积特征。其次,为了避免直接稀疏编码在高清CNN特征上的高计算成本,我们提出在任务驱动的稀疏字典上学习正交投影空间中的稀疏表示(SR)。然后利用SR上的判别投影,将整个学习过程看作是一个跟踪商最大化的联合优化问题,涉及CNN参数、CNN特征的正交投影、字典和稀疏码的判别投影。相关的成本函数在斯蒂夫流形、斜流形和格拉斯曼流形的产品流形上有很好的定义。采用几何随机梯度下降算法进行优化。最后,从图像分类的角度进一步研究了该方法所学习到的词典和投影的质量。实验结果表明,该方法与目前最先进的图像分类方法相比,具有较强的竞争力。

URL

https://arxiv.org/abs/1903.09977

PDF

https://arxiv.org/pdf/1903.09977.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot