Paper Reading AI Learner

SUVR: A Search-based Approach to Unsupervised Visual Representation Learning

2023-05-24 05:57:58
Yi-Zhan Xu, Chih-Yao Chen, Cheng-Te Li

Abstract

Unsupervised learning has grown in popularity because of the difficulty of collecting annotated data and the development of modern frameworks that allow us to learn from unlabeled data. Existing studies, however, either disregard variations at different levels of similarity or only consider negative samples from one batch. We argue that image pairs should have varying degrees of similarity, and the negative samples should be allowed to be drawn from the entire dataset. In this work, we propose Search-based Unsupervised Visual Representation Learning (SUVR) to learn better image representations in an unsupervised manner. We first construct a graph from the image dataset by the similarity between images, and adopt the concept of graph traversal to explore positive samples. In the meantime, we make sure that negative samples can be drawn from the full dataset. Quantitative experiments on five benchmark image classification datasets demonstrate that SUVR can significantly outperform strong competing methods on unsupervised embedding learning. Qualitative experiments also show that SUVR can produce better representations in which similar images are clustered closer together than unrelated images in the latent space.

Abstract (translated)

非监督学习因其收集标注数据的困难以及现代框架的发展而变得越来越受欢迎。然而,现有的研究要么忽略了不同相似性水平下的变化,要么只考虑了一组样本中的消极样本。我们认为图像对应该具有不同程度的相似性,并且消极样本应该从整个数据集随机抽取。在本研究中,我们提出了基于搜索的非监督视觉表示学习(SUVR),以在没有监督嵌入学习的情况下学习更好的图像表示。我们首先通过图像之间的相似性构建图像集的图,并采用图遍历的概念来探索积极样本。同时,我们确保可以从整个数据集随机抽取消极样本。对五个基准图像分类数据集进行定量实验表明,SUVR可以在无监督嵌入学习中显著优于强大的竞争方法。定性实验也表明,SUVR可以在潜在空间中相似的图像簇在一起,比无关的图像在分离空间中更紧密地聚集在一起,从而生成更好的表示。

URL

https://arxiv.org/abs/2305.14754

PDF

https://arxiv.org/pdf/2305.14754.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot