Paper Reading AI Learner

CoCoG: Controllable Visual Stimuli Generation based on Human Concept Representations

2024-04-25 10:10:48
Chen Wei, Jiachen Zou, Dietmar Heinke, Quanying Liu

Abstract

A central question for cognitive science is to understand how humans process visual objects, i.e, to uncover human low-dimensional concept representation space from high-dimensional visual stimuli. Generating visual stimuli with controlling concepts is the key. However, there are currently no generative models in AI to solve this problem. Here, we present the Concept based Controllable Generation (CoCoG) framework. CoCoG consists of two components, a simple yet efficient AI agent for extracting interpretable concept and predicting human decision-making in visual similarity judgment tasks, and a conditional generation model for generating visual stimuli given the concepts. We quantify the performance of CoCoG from two aspects, the human behavior prediction accuracy and the controllable generation ability. The experiments with CoCoG indicate that 1) the reliable concept embeddings in CoCoG allows to predict human behavior with 64.07\% accuracy in the THINGS-similarity dataset; 2) CoCoG can generate diverse objects through the control of concepts; 3) CoCoG can manipulate human similarity judgment behavior by intervening key concepts. CoCoG offers visual objects with controlling concepts to advance our understanding of causality in human cognition. The code of CoCoG is available at \url{this https URL}.

Abstract (translated)

cognitive科学的一个核心问题是了解人类如何处理视觉对象,即揭示人类低维度概念表示空间从高维视觉刺激中。生成具有控制概念的视觉刺激是关键。然而,目前还没有人工智能中的生成模型来解决这个问题。在这里,我们提出了基于概念的控制性生成(CoCoG)框架。CoCoG由两个组件组成,一个是简单而高效的AI代理,用于提取可解释的概念和在视觉相似性判断任务中预测人类决策,另一个是条件生成模型,用于根据概念生成视觉刺激。我们从两个方面评估CoCoG的表现,即人类行为预测准确性和可控制性生成能力。CoCoG与CoCoG的实验表明,1)CoCoG中的可靠概念嵌入允许在THINGS-similarity数据集中预测人类行为达到64.07%;2)通过控制概念,CoCoG可以生成多样化的对象;3)通过干预关键概念,CoCoG可以操纵人类相似性判断行为。CoCoG为我们在人类认知中的因果关系提供了具有控制概念的视觉对象。CoCoG代码可在此处访问:\url{这个链接}。

URL

https://arxiv.org/abs/2404.16482

PDF

https://arxiv.org/pdf/2404.16482.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot