Paper Reading AI Learner

ChiroDiff: Modelling chirographic data with Diffusion Models

2023-04-07 15:17:48
Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, Yi-Zhe Song

Abstract

Generative modelling over continuous-time geometric constructs, a.k.a such as handwriting, sketches, drawings etc., have been accomplished through autoregressive distributions. Such strictly-ordered discrete factorization however falls short of capturing key properties of chirographic data -- it fails to build holistic understanding of the temporal concept due to one-way visibility (causality). Consequently, temporal data has been modelled as discrete token sequences of fixed sampling rate instead of capturing the true underlying concept. In this paper, we introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data that specifically addresses these flaws. Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate up to a good extent. Moreover, we show that many important downstream utilities (e.g. conditional sampling, creative mixing) can be flexibly implemented using ChiroDiff. We further show some unique use-cases like stochastic vectorization, de-noising/healing, abstraction are also possible with this model-class. We perform quantitative and qualitative evaluation of our framework on relevant datasets and found it to be better or on par with competing approaches.

Abstract (translated)

连续性时间几何构造的生成建模,例如手写、 Sketch、绘图等,是通过自回归分布完成的。然而,这种严格排序的离散 factorization 未能捕捉到的图像数据的关键特性——由于单向可视化(因果性),它无法构建整体理解,因此,时间数据被建模为固定采样率的离散 token 序列,而不是捕捉真正 underlying 概念。在本文中,我们介绍了一种强大的模型类别,名为“去噪扩散概率模型”(DDPM),专门解决了图像数据的这些缺陷。我们的模型名为“ChiroDiff”,它不是自回归的,因此学习捕捉整体概念,因此可以很好地适应更高的时间采样率。此外,我们展示了许多重要的后续功能(例如条件采样、创造性混合)可以使用 ChiroDiff 灵活实现。我们还展示了一些独特的应用场景,例如随机向量化、去噪/修复、抽象等。我们对这些模型类别在相关数据集上进行定量和定性评估,并发现它更好或与竞争方法相当。

URL

https://arxiv.org/abs/2304.03785

PDF

https://arxiv.org/pdf/2304.03785.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot