Paper Reading AI Learner

Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards


Abstract

Artificial Intelligence (AI) is taking on increasingly autonomous roles, e.g., browsing the web as a research assistant and managing money. But specifying goals and restrictions for AI behavior is difficult. Similar to how parties to a legal contract cannot foresee every potential "if-then" contingency of their future relationship, we cannot specify desired AI behavior for all circumstances. Legal standards facilitate the robust communication of inherently vague and underspecified goals. Instructions (in the case of language models, "prompts") that employ legal standards will allow AI agents to develop shared understandings of the spirit of a directive that can adapt to novel situations, and generalize expectations regarding acceptable actions to take in unspecified states of the world. Standards have built-in context that is lacking from other goal specification languages, such as plain language and programming languages. Through an empirical study on thousands of evaluation labels we constructed from U.S. court opinions, we demonstrate that large language models (LLMs) are beginning to exhibit an "understanding" of one of the most relevant legal standards for AI agents: fiduciary obligations. Performance comparisons across models suggest that, as LLMs continue to exhibit improved core capabilities, their legal standards understanding will also continue to improve. OpenAI's latest LLM has 78% accuracy on our data, their previous release has 73% accuracy, and a model from their 2020 GPT-3 paper has 27% accuracy (worse than random). Our research is an initial step toward a framework for evaluating AI understanding of legal standards more broadly, and for conducting reinforcement learning with legal feedback (RLLF).

Abstract (translated)

人工智能(AI)正在越来越多地承担越来越自主的角色,例如作为研究助手浏览互联网和管理资金。但是,为AI行为指定目标和限制非常困难。类似于法律合同各方无法预见未来关系中的每个潜在的“如果-那么”条件,我们不能为所有情况都指定想要实现的AI行为。法律标准有助于 robust 地沟通具有潜在模糊性和不明确目标的明确目标。使用法律标准的指令(对于语言模型,“prompts”)将允许AI代理开发共享的理解,适应新的情境,并一般性地期望在未指明状态中采取可以接受的行动。标准具有与其他目标指定语言不同的内置上下文,例如普通语言和编程语言。通过对美国法院 opinions 进行经验研究,我们证明了大型语言模型(LLMs)正在表现出对AI代理最相关的法律标准之一:道德义务的理解。不同模型的性能比较表明,随着LLMs继续表现出改进的核心能力,它们的法律标准理解将继续改善。OpenAI 最新的LLM在我们数据上的准确率为78%,他们以前的发布率为73%,而从他们的2020年GPT-3论文中抽出的一个模型的准确率为27%(比随机情况糟糕)。我们的研究是评估AI对法律标准的深刻理解更广泛框架的第一步,以及使用法律反馈(RLLF)进行强化学习的第一步。

URL

https://arxiv.org/abs/2301.10095

PDF

https://arxiv.org/pdf/2301.10095.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot