Paper Reading AI Learner

Tulip Agent -- Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries

2024-07-31 17:50:54
Felix Ocker, Daniel Tanneberg, Julian Eggert, Michael Gienger

Abstract

We introduce tulip agent, an architecture for autonomous LLM-based agents with Create, Read, Update, and Delete access to a tool library containing a potentially large number of tools. In contrast to state-of-the-art implementations, tulip agent does not encode the descriptions of all available tools in the system prompt, which counts against the model's context window, or embed the entire prompt for retrieving suitable tools. Instead, the tulip agent can recursively search for suitable tools in its extensible tool library, implemented exemplarily as a vector store. The tulip agent architecture significantly reduces inference costs, allows using even large tool libraries, and enables the agent to adapt and extend its set of tools. We evaluate the architecture with several ablation studies in a mathematics context and demonstrate its generalizability with an application to robotics. A reference implementation and the benchmark are available at this http URL.

Abstract (translated)

我们提出了tulip代理,一种基于自主LLM代理的建筑,具有对包含可能大量工具的工具库的创建、读取、更新和删除访问。与最先进的实现相比,tulip代理没有在系统提示中编码所有可用工具的描述,这抵消了模型的上下文窗口,或者将整个提示嵌入其中以检索合适的工具。相反,tulip代理可以在其可扩展的工具库中递归地搜索合适的工具,实现例程作为一个向量存储器。tulip代理架构显著减少了推理成本,允许使用大型工具库,并使代理能够适应和扩展其工具集合。我们在数学上下文中进行多个抽象研究,并通过将应用程序到机器人领域来评估该架构。参考实现和基准可以在该http URL上找到。

URL

https://arxiv.org/abs/2407.21778

PDF

https://arxiv.org/pdf/2407.21778.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot