Paper Reading AI Learner

A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI

2024-05-07 14:01:33
Hannah Chafetz, Sampriti Saxena, Stefaan G. Verhulst

Abstract

Since late 2022, generative AI has taken the world by storm, with widespread use of tools including ChatGPT, Gemini, and Claude. Generative AI and large language model (LLM) applications are transforming how individuals find and access data and knowledge. However, the intricate relationship between open data and generative AI, and the vast potential it holds for driving innovation in this field remain underexplored areas. This white paper seeks to unpack the relationship between open data and generative AI and explore possible components of a new Fourth Wave of Open Data: Is open data becoming AI ready? Is open data moving towards a data commons approach? Is generative AI making open data more conversational? Will generative AI improve open data quality and provenance? Towards this end, we provide a new Spectrum of Scenarios framework. This framework outlines a range of scenarios in which open data and generative AI could intersect and what is required from a data quality and provenance perspective to make open data ready for those specific scenarios. These scenarios include: pertaining, adaptation, inference and insight generation, data augmentation, and open-ended exploration. Through this process, we found that in order for data holders to embrace generative AI to improve open data access and develop greater insights from open data, they first must make progress around five key areas: enhance transparency and documentation, uphold quality and integrity, promote interoperability and standards, improve accessibility and useability, and address ethical considerations.

Abstract (translated)

自2022年底以来,生成式人工智能(generative AI)彻底颠覆了世界,各种工具(包括ChatGPT、Gemini和Claude)的广泛应用使人们能够以全新的方式找到和访问数据和知识。生成式人工智能和大语言模型(LLM)应用正在改变个人如何发现和获取数据和知识的方式。然而,开放数据和生成式人工智能之间的关系以及它在推动这一领域创新方面所具有的广泛潜力仍然是未探索的领域。这份白皮书旨在解开开放数据和生成式人工智能之间的关系,并探讨可能的第四波开放数据的新组件:开放数据是否成为人工智能(AI)准备就绪?开放数据是否正朝着数据共享方法论演变?生成式人工智能是否使开放数据更具交互性?生成式人工智能是否改善了开放数据的质量和来源?为此,我们提供了一个新的场景框架。这个框架概述了开放数据和生成式人工智能在不同场景下可能产生的交集,以及从数据质量和来源角度看,开放数据在这些场景下做好准备所需的必要条件。这些场景包括:相关性、适应性、推理和洞察生成、数据增强和开放性探索。通过这个过程,我们发现,为了让数据持有者利用生成式人工智能改进开放数据访问并从开放数据中获得更大洞察,他们首先必须围绕五个关键领域取得进展:提高透明度和文档记录、维护质量和完整性、促进互操作性和标准、提高可访问性和可用性,以及解决道德问题。

URL

https://arxiv.org/abs/2405.04333

PDF

https://arxiv.org/pdf/2405.04333.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot