Abstract
Thanks to the advances in generative architectures and large language models, data scientists can now code pipelines of machine-learning operations to process large collections of unstructured data. Recent progress has seen the rise of declarative AI frameworks (e.g., Palimpzest, Lotus, and DocETL) to build optimized and increasingly complex pipelines, but these systems often remain accessible only to expert programmers. In this demonstration, we present PalimpChat, a chat-based interface to Palimpzest that bridges this gap by letting users create and run sophisticated AI pipelines through natural language alone. By integrating Archytas, a ReAct-based reasoning agent, and Palimpzest's suite of relational and LLM-based operators, PalimpChat provides a practical illustration of how a chat interface can make declarative AI frameworks truly accessible to non-experts. Our demo system is publicly available online. At SIGMOD'25, participants can explore three real-world scenarios--scientific discovery, legal discovery, and real estate search--or apply PalimpChat to their own datasets. In this paper, we focus on how PalimpChat, supported by the Palimpzest optimizer, simplifies complex AI workflows such as extracting and analyzing biomedical data.
Abstract (translated)
感谢生成架构和大型语言模型的进步,数据科学家现在可以编写用于处理大量非结构化数据的机器学习操作流水线。近期的发展见证了声明式人工智能框架(例如Palimpzest、Lotus和DocETL)的兴起,这些框架能够构建优化且日趋复杂的流水线,但这类系统通常仅对专家程序员开放。在此演示中,我们介绍了一种名为PalimpChat的新聊天界面,该界面基于Palimpzest并提供了一个桥梁,使用户通过自然语言就能创建和运行复杂的人工智能管道,从而弥合了这一差距。通过结合基于ReAct的推理代理Archytas与Palimpzest的关系运算符及大型语言模型(LLM)运算符套件,PalimpChat实际展示了聊天界面如何使得声明式AI框架对非专业人士变得真正可访问。 我们的演示系统已在网上公开可用。在SIGMOD'25会议上,参与者可以探索三个现实世界场景——科学发现、法律发现和房地产搜索——或者将PalimpChat应用到他们自己的数据集上。本文着重介绍了PalimpChat如何通过帕尔姆佩斯特优化器的支持简化复杂的人工智能工作流程,例如提取和分析生物医学数据。
URL
https://arxiv.org/abs/2502.03368