Paper Reading AI Learner

Towards Unsupervised Representation Learning: Learning, Evaluating and Transferring Visual Representations


Abstract

Unsupervised representation learning aims at finding methods that learn representations from data without annotation-based signals. Abstaining from annotations not only leads to economic benefits but may - and to some extent already does - result in advantages regarding the representation's structure, robustness, and generalizability to different tasks. In the long run, unsupervised methods are expected to surpass their supervised counterparts due to the reduction of human intervention and the inherently more general setup that does not bias the optimization towards an objective originating from specific annotation-based signals. While major advantages of unsupervised representation learning have been recently observed in natural language processing, supervised methods still dominate in vision domains for most tasks. In this dissertation, we contribute to the field of unsupervised (visual) representation learning from three perspectives: (i) Learning representations: We design unsupervised, backpropagation-free Convolutional Self-Organizing Neural Networks (CSNNs) that utilize self-organization- and Hebbian-based learning rules to learn convolutional kernels and masks to achieve deeper backpropagation-free models. (ii) Evaluating representations: We build upon the widely used (non-)linear evaluation protocol to define pretext- and target-objective-independent metrics for measuring and investigating the objective function mismatch between various unsupervised pretext tasks and target tasks. (iii) Transferring representations: We contribute CARLANE, the first 3-way sim-to-real domain adaptation benchmark for 2D lane detection, and a method based on prototypical self-supervised learning. Finally, we contribute a content-consistent unpaired image-to-image translation method that utilizes masks, global and local discriminators, and similarity sampling to mitigate content inconsistencies.

Abstract (translated)

无监督表示学习的目标是找到一种从数据中学习表示的方法,而无需基于注释的信号。避免注释不仅带来了经济利益,而且可能会 - 以及在一定程度上已经 - 导致关于表示结构、鲁棒性和对不同任务的泛化优势。在长期内,无监督方法预计会超越其监督 counterparts,因为减少了人类干预,并且具有更通用设置,不会将优化引导到特定注释基于信号的目标的优化。 虽然最近在自然语言处理领域观察到了无监督表示学习的主要优势,但在视觉领域,有监督方法仍然在大多数任务中占据主导地位。在这篇论文中,我们从三个角度为无监督(视觉)表示学习做出贡献: (i)学习表示:我们设计了一种无监督、无需反向传播的卷积自组织神经网络(CSNN),利用自组织和 Hebbian 学习规则来学习卷积核和掩码,以实现更深层次的无需反向传播模型。 (ii)评估表示:我们基于广泛使用的(非)线性评估协议定义了预训练和目标对象独立的目标函数不匹配度度量指标,以测量和调查不同无监督预训练任务和目标任务之间的目标函数不匹配。 (iii)迁移表示:我们贡献了 CarlaN,第一个三维模拟到真实领域的迁移基准,以及基于原型自监督学习的方法。最后,我们还贡献了一种内容一致的无配对图像到图像迁移方法,该方法利用掩码、全局和局部判别器以及相似采样来减轻内容不一致性。

URL

https://arxiv.org/abs/2312.00101

PDF

https://arxiv.org/pdf/2312.00101.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot