Paper Reading AI Learner

Emergence of Painting Ability via Recognition-Driven Evolution

2025-01-09 04:37:31
Yi Lin, Lin Gu, Ziteng Cui, Shenghan Su, Yumo Hao, Yingtao Tian, Tatsuya Harada, Jianfei Yang

Abstract

From Paleolithic cave paintings to Impressionism, human painting has evolved to depict increasingly complex and detailed scenes, conveying more nuanced messages. This paper attempts to emerge this artistic capability by simulating the evolutionary pressures that enhance visual communication efficiency. Specifically, we present a model with a stroke branch and a palette branch that together simulate human-like painting. The palette branch learns a limited colour palette, while the stroke branch parameterises each stroke using Bézier curves to render an image, subsequently evaluated by a high-level recognition module. We quantify the efficiency of visual communication by measuring the recognition accuracy achieved with machine vision. The model then optimises the control points and colour choices for each stroke to maximise recognition accuracy with minimal strokes and colours. Experimental results show that our model achieves superior performance in high-level recognition tasks, delivering artistic expression and aesthetic appeal, especially in abstract sketches. Additionally, our approach shows promise as an efficient bit-level image compression technique, outperforming traditional methods.

Abstract (translated)

从旧石器时代的洞穴壁画到印象派,人类绘画的发展历程中,描绘的场景逐渐变得更加复杂和详细,并传达出更为细腻的信息。本文试图通过模拟增强视觉通信效率的进化压力来再现这种艺术能力。具体来说,我们提出了一种具有笔触分支和调色板分支的模型,这些分支共同模拟了类似人类的作画方式。调色板分支学习有限的颜色方案,而笔触分支则使用Bézier曲线参数化每个笔触以生成图像,并随后通过高层次识别模块进行评估。我们通过测量机器视觉实现的识别准确率来量化视觉通信的有效性。模型优化每条笔触的控制点和颜色选择,以在最少的笔画和颜色下最大化识别准确性。 实验结果显示,我们的模型在高级别识别任务中表现出色,在抽象草图方面尤其具有艺术表现力和美学吸引力。此外,我们的方法显示出了作为高效位级图像压缩技术的巨大潜力,并且优于传统的方法。

URL

https://arxiv.org/abs/2501.04966

PDF

https://arxiv.org/pdf/2501.04966.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot