Abstract
AI agents today are mostly siloed - they either retrieve and reason over vast amount of digital information and knowledge obtained online; or interact with the physical world through embodied perception, planning and action - but rarely both. This separation limits their ability to solve tasks that require integrated physical and digital intelligence, such as cooking from online recipes, navigating with dynamic map data, or interpreting real-world landmarks using web knowledge. We introduce Embodied Web Agents, a novel paradigm for AI agents that fluidly bridge embodiment and web-scale reasoning. To operationalize this concept, we first develop the Embodied Web Agents task environments, a unified simulation platform that tightly integrates realistic 3D indoor and outdoor environments with functional web interfaces. Building upon this platform, we construct and release the Embodied Web Agents Benchmark, which encompasses a diverse suite of tasks including cooking, navigation, shopping, tourism, and geolocation - all requiring coordinated reasoning across physical and digital realms for systematic assessment of cross-domain intelligence. Experimental results reveal significant performance gaps between state-of-the-art AI systems and human capabilities, establishing both challenges and opportunities at the intersection of embodied cognition and web-scale knowledge access. All datasets, codes and websites are publicly available at our project page this https URL.
Abstract (translated)
当前的AI代理大多被隔离运作,它们要么检索和推断互联网上获取的大量数字信息和知识;要么通过具身感知、规划和行动与物理世界互动——但很少同时进行两者。这种分离限制了他们解决需要集成物理和数字智能的任务的能力,例如根据在线食谱烹饪、使用动态地图数据导航或利用网络知识解释现实世界的地标。我们介绍了一种新的AI代理范式:具身Web代理(Embodied Web Agents),该范式流畅地将具身性和大规模的网络推理结合起来。 为了实现这一概念,我们首先开发了具身Web代理任务环境,这是一个统一的模拟平台,紧密集成了真实的三维室内和室外环境与功能性的网页界面。在此基础上,我们构建并发布了具身Web代理基准测试(Embodied Web Agents Benchmark),涵盖了包括烹饪、导航、购物、旅游和地理定位等多样化的任务——所有这些都要求跨物理和数字领域协调推理来进行系统性评估的跨域智能。实验结果显示了最先进的AI系统与人类能力之间的显著性能差距,揭示了具身认知与大规模网络知识访问交汇处面临的挑战与机遇。 我们项目的全部数据集、代码和网站均可在我们的项目网页上公开获取:[提供URL]
URL
https://arxiv.org/abs/2506.15677