Abstract
While Large Language Model (LLM) agents show great potential for automated UI navigation such as automated UI testing and AI assistants, their efficiency has been largely overlooked. Our motivating study reveals that inefficient UI representation creates a critical performance bottleneck. However, UI representation optimization, formulated as the task of automatically generating programs that transform UI representations, faces two unique challenges. First, the lack of Boolean oracles, which traditional program synthesis uses to decisively validate semantic correctness, poses a fundamental challenge to co-optimization of token efficiency and completeness. Second, the need to process large, complex UI trees as input while generating long, compositional transformation programs, making the search space vast and error-prone. Toward addressing the preceding limitations, we present UIFormer, the first automated optimization framework that synthesizes UI transformation programs by conducting constraint-based optimization with structured decomposition of the complex synthesis task. First, UIFormer restricts the program space using a domain-specific language (DSL) that captures UI-specific operations. Second, UIFormer conducts LLM-based iterative refinement with correctness and efficiency rewards, providing guidance for achieving the efficiency-completeness co-optimization. UIFormer operates as a lightweight plugin that applies transformation programs for seamless integration with existing LLM agents, requiring minimal modifications to their core logic. Evaluations across three UI navigation benchmarks spanning Android and Web platforms with five LLMs demonstrate that UIFormer achieves 48.7% to 55.8% token reduction with minimal runtime overhead while maintaining or improving agent performance. Real-world industry deployment at WeChat further validates the practical impact of UIFormer.
Abstract (translated)
尽管大型语言模型(LLM)代理在自动UI导航方面,如自动化UI测试和AI助手,展现出巨大的潜力,但它们的效率问题却鲜少受到关注。我们的研究发现,低效的UI表示是导致性能瓶颈的关键因素。然而,UI表示优化,作为一种自动生成程序以转换UI表示的任务,面临着两个独特的挑战:首先,缺乏布尔预言(传统程序合成中用于决定性验证语义正确性的方法),这对令牌效率和完整性的共优构成了根本性挑战;其次,需要处理庞大的、复杂的UI树作为输入,并在此过程中生成长且组合的转换程序,这使得搜索空间庞大而容易出错。为了解决上述限制,我们提出了UIFormer,这是第一个自动优化框架,通过基于约束的优化和复杂合成任务的结构化分解来综合UI转换程序。首先,UIFormer使用一种特定于领域的语言(DSL)捕获UI特有操作以限制程序空间。其次,UIFormer执行基于LLM的迭代细化,并给予正确性和效率奖励,为实现效率-完整性共优提供指导。UIFormer作为一个轻量级插件运行,应用转换程序以便与现有的LLM代理无缝集成,同时对其核心逻辑进行最少修改。通过在涵盖Android和Web平台的三个UI导航基准测试中使用五种不同的大型语言模型进行评估,结果表明UIFormer能够实现48.7%至55.8%的令牌减少,且运行时开销极小,并保持或提升代理性能。此外,在微信中的实际工业部署进一步验证了UIFormer的实际影响。
URL
https://arxiv.org/abs/2512.13438