Paper Reading AI Learner

Controllable GUI Exploration

2025-02-05 16:25:35
Aryan Garg, Yue Jiang, Antti Oulasvirta

Abstract

During the early stages of interface design, designers need to produce multiple sketches to explore a design space. Design tools often fail to support this critical stage, because they insist on specifying more details than necessary. Although recent advances in generative AI have raised hopes of solving this issue, in practice they fail because expressing loose ideas in a prompt is impractical. In this paper, we propose a diffusion-based approach to the low-effort generation of interface sketches. It breaks new ground by allowing flexible control of the generation process via three types of inputs: A) prompts, B) wireframes, and C) visual flows. The designer can provide any combination of these as input at any level of detail, and will get a diverse gallery of low-fidelity solutions in response. The unique benefit is that large design spaces can be explored rapidly with very little effort in input-specification. We present qualitative results for various combinations of input specifications. Additionally, we demonstrate that our model aligns more accurately with these specifications than other models.

Abstract (translated)

在界面设计的早期阶段,设计师需要制作多张草图以探索设计方案。然而,现有的设计工具往往无法有效支持这一关键步骤,因为它们要求指定过多不必要的细节。尽管最近生成式人工智能技术的进步带来了解决这一问题的希望,但实际上这些方法由于难以通过提示表达松散的想法而未能成功。在本文中,我们提出了一种基于扩散模型的方法来快速生成界面草图,该方法通过允许设计师以三种类型输入的任意组合进行灵活控制来创新性地解决了这个问题:A)描述性的文字提示、B)线框图和C)视觉流程图。设计师可以在任意详细程度上提供这些输入,并将获得一系列多样化的低保真度解决方案作为回应。 这种方法的独特优势在于,它能够以极小的输入指定量快速探索大规模的设计空间。我们展示了各种不同的输入组合所得到的结果,并且证明我们的模型在与特定设计规范的一致性方面优于其他现有模型。

URL

https://arxiv.org/abs/2502.03330

PDF

https://arxiv.org/pdf/2502.03330.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot