Paper Reading AI Learner

Equipping Sketch Patches with Context-Aware Positional Encoding for Graphic Sketch Representation

2024-03-26 09:26:12
Sicong Zang, Zhijun Fang

Abstract

The drawing order of a sketch records how it is created stroke-by-stroke by a human being. For graphic sketch representation learning, recent studies have injected sketch drawing orders into graph edge construction by linking each patch to another in accordance to a temporal-based nearest neighboring strategy. However, such constructed graph edges may be unreliable, since a sketch could have variants of drawings. In this paper, we propose a variant-drawing-protected method by equipping sketch patches with context-aware positional encoding (PE) to make better use of drawing orders for learning graphic sketch representation. Instead of injecting sketch drawings into graph edges, we embed these sequential information into graph nodes only. More specifically, each patch embedding is equipped with a sinusoidal absolute PE to highlight the sequential position in the drawing order. And its neighboring patches, ranked by the values of self-attention scores between patch embeddings, are equipped with learnable relative PEs to restore the contextual positions within a neighborhood. During message aggregation via graph convolutional networks, a node receives both semantic contents from patch embeddings and contextual patterns from PEs by its neighbors, arriving at drawing-order-enhanced sketch representations. Experimental results indicate that our method significantly improves sketch healing and controllable sketch synthesis.

Abstract (translated)

草图的绘制顺序记录了它是通过连续的绘制方式由人类创建的。对于图形草图表示学习,最近的研究将绘制草图顺序注入到图的边缘构建中,根据基于时间的最近邻策略将每个补丁链接到另一个。然而,这样的构建图边可能不可靠,因为补图可能有不同的绘制版本。在本文中,我们提出了一种版本保护的绘制方法,通过为补丁分配上下文感知的位置编码(PE)来更好地利用学习图形草图表示的绘制顺序。我们不再将补图绘制直接注入到图的边缘中,而是将顺序信息仅嵌入图的节点中。具体来说,每个补丁嵌入都配备了一个正弦波绝对PE,以突出绘制顺序中的序列位置。并且它的邻居补丁,根据补丁嵌入之间自我关注分数的值排序,都配备有可学习的相对PE,以恢复邻域内的上下文位置。在图卷积网络消息聚合过程中,节点通过图卷积操作从补丁嵌入中获取语义内容,并从PE中获取上下文模式,从而达到增强绘制顺序的草图表示。实验结果表明,我们的方法显著提高了草图修复和可控制草图合成的效果。

URL

https://arxiv.org/abs/2403.17525

PDF

https://arxiv.org/pdf/2403.17525.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot