Abstract
In contemporary design practices, the integration of computer vision and generative artificial intelligence (genAI) represents a transformative shift towards more interactive and inclusive processes. These technologies offer new dimensions of image analysis and generation, which are particularly relevant in the context of urban landscape reconstruction. This paper presents a novel workflow encapsulated within a prototype application, designed to leverage the synergies between advanced image segmentation and diffusion models for a comprehensive approach to urban design. Our methodology encompasses the OneFormer model for detailed image segmentation and the Stable Diffusion XL (SDXL) diffusion model, implemented through ControlNet, for generating images from textual descriptions. Validation results indicated a high degree of performance by the prototype application, showcasing significant accuracy in both object detection and text-to-image generation. This was evidenced by superior Intersection over Union (IoU) and CLIP scores across iterative evaluations for various categories of urban landscape features. Preliminary testing included utilising UrbanGenAI as an educational tool enhancing the learning experience in design pedagogy, and as a participatory instrument facilitating community-driven urban planning. Early results suggested that UrbanGenAI not only advances the technical frontiers of urban landscape reconstruction but also provides significant pedagogical and participatory planning benefits. The ongoing development of UrbanGenAI aims to further validate its effectiveness across broader contexts and integrate additional features such as real-time feedback mechanisms and 3D modelling capabilities. Keywords: generative AI; panoptic image segmentation; diffusion models; urban landscape design; design pedagogy; co-design
Abstract (translated)
在当代设计实践中,将计算机视觉和生成式人工智能(genAI)相结合代表了一种向更交互和包容性过程的转变。这些技术提供了新的图像分析和生成维度,特别是在城市景观重建的背景下,这些维度尤为重要。本文介绍了一种新的工作流程,该工作流程封装在一个原型应用程序中,旨在利用高级图像分割和扩散模型的协同作用,实现全面的城市场景设计。我们的方法论包括OneFormer模型(详细图像分割)和Stable Diffusion XL(SDXL)扩散模型,通过ControlNet实现从文本描述生成图像。验证结果表明,原型应用程序的表现非常出色,展示了在物体检测和文本到图像生成方面的显著准确性。这通过各种城市景观特征的迭代评估中的IoU和CLIP得分得到了证实。初步测试包括利用UrbanGenAI作为教学工具来提高设计教育体验,以及作为参与式工具促进社区驱动的城市场景规划。初步结果表明,UrbanGenAI不仅推动了城市场景重建的技术前沿,而且提供了显著的教育和参与式规划优势。UrbanGenAI的持续发展旨在进一步验证其有效性,并纳入实时反馈机制和3D建模等功能。关键词:生成式人工智能;全景图像分割;扩散模型;城市场景设计;设计教育;共同设计
URL
https://arxiv.org/abs/2401.14379