Paper Reading AI Learner

UrbanGenAI: Reconstructing Urban Landscapes using Panoptic Segmentation and Diffusion Models


Abstract

In contemporary design practices, the integration of computer vision and generative artificial intelligence (genAI) represents a transformative shift towards more interactive and inclusive processes. These technologies offer new dimensions of image analysis and generation, which are particularly relevant in the context of urban landscape reconstruction. This paper presents a novel workflow encapsulated within a prototype application, designed to leverage the synergies between advanced image segmentation and diffusion models for a comprehensive approach to urban design. Our methodology encompasses the OneFormer model for detailed image segmentation and the Stable Diffusion XL (SDXL) diffusion model, implemented through ControlNet, for generating images from textual descriptions. Validation results indicated a high degree of performance by the prototype application, showcasing significant accuracy in both object detection and text-to-image generation. This was evidenced by superior Intersection over Union (IoU) and CLIP scores across iterative evaluations for various categories of urban landscape features. Preliminary testing included utilising UrbanGenAI as an educational tool enhancing the learning experience in design pedagogy, and as a participatory instrument facilitating community-driven urban planning. Early results suggested that UrbanGenAI not only advances the technical frontiers of urban landscape reconstruction but also provides significant pedagogical and participatory planning benefits. The ongoing development of UrbanGenAI aims to further validate its effectiveness across broader contexts and integrate additional features such as real-time feedback mechanisms and 3D modelling capabilities. Keywords: generative AI; panoptic image segmentation; diffusion models; urban landscape design; design pedagogy; co-design

Abstract (translated)

在当代设计实践中,将计算机视觉和生成式人工智能(genAI)相结合代表了一种向更交互和包容性过程的转变。这些技术提供了新的图像分析和生成维度,特别是在城市景观重建的背景下,这些维度尤为重要。本文介绍了一种新的工作流程,该工作流程封装在一个原型应用程序中,旨在利用高级图像分割和扩散模型的协同作用,实现全面的城市场景设计。我们的方法论包括OneFormer模型(详细图像分割)和Stable Diffusion XL(SDXL)扩散模型,通过ControlNet实现从文本描述生成图像。验证结果表明,原型应用程序的表现非常出色,展示了在物体检测和文本到图像生成方面的显著准确性。这通过各种城市景观特征的迭代评估中的IoU和CLIP得分得到了证实。初步测试包括利用UrbanGenAI作为教学工具来提高设计教育体验,以及作为参与式工具促进社区驱动的城市场景规划。初步结果表明,UrbanGenAI不仅推动了城市场景重建的技术前沿,而且提供了显著的教育和参与式规划优势。UrbanGenAI的持续发展旨在进一步验证其有效性,并纳入实时反馈机制和3D建模等功能。关键词:生成式人工智能;全景图像分割;扩散模型;城市场景设计;设计教育;共同设计

URL

https://arxiv.org/abs/2401.14379

PDF

https://arxiv.org/pdf/2401.14379.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot