Paper Reading AI Learner

CRC-RL: A Novel Visual Feature Representation Architecture for Unsupervised Reinforcement Learning

2023-01-31 08:41:18
Darshita Jain, Anima Majumder, Samrat Dutta, Swagat Kumar

Abstract

This paper addresses the problem of visual feature representation learning with an aim to improve the performance of end-to-end reinforcement learning (RL) models. Specifically, a novel architecture is proposed that uses a heterogeneous loss function, called CRC loss, to learn improved visual features which can then be used for policy learning in RL. The CRC-loss function is a combination of three individual loss functions, namely, contrastive, reconstruction and consistency loss. The feature representation is learned in parallel to the policy learning while sharing the weight updates through a Siamese Twin encoder model. This encoder model is augmented with a decoder network and a feature projection network to facilitate computation of the above loss components. Through empirical analysis involving latent feature visualization, an attempt is made to provide an insight into the role played by this loss function in learning new action-dependent features and how they are linked to the complexity of the problems being solved. The proposed architecture, called CRC-RL, is shown to outperform the existing state-of-the-art methods on the challenging Deep mind control suite environments by a significant margin thereby creating a new benchmark in this field.

Abstract (translated)

本文旨在解决视觉特征表示学习的问题,以改善全端强化学习模型的性能。具体而言,我们提出了一种新架构,该架构使用一种称为CRC loss的非均匀损失函数来学习改进的视觉特征,这些特征可以用于全端强化学习中的政策学习。CRC loss函数是三个 individual 损失函数的组合,分别是对比性、重建和一致性损失。特征表示是在政策学习的同时进行的,并通过一个西蒙双编码器模型共享权重更新。该编码器模型被添加了一个解码器和特征投影网络,以方便计算上述损失 components。通过涉及潜在特征可视化的实证分析,我们试图提供 insights into this loss function 在学习新的动作依赖特征中的作用,以及它们如何与解决问题的复杂性相关联。我们提出的架构称为CRC-RL,在挑战性的深度心灵控制套件环境中比现有的先进方法表现更好,从而创造了该领域的新基准。

URL

https://arxiv.org/abs/2301.13473

PDF

https://arxiv.org/pdf/2301.13473.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot