The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant performance degradation in previous knowledge domains -- a phenomenon known as "catastrophic forgetting". While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs. In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL. This survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning), i.e., continual adaptation from general to specific capabilities, and horizontal continuity (or horizontal continual learning), i.e., continual adaptation across time and domains (Section 3). We then summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section 4). Then we provide an overview of evaluation protocols for continual learning with LLMs, along with the current available data sources (Section 5). Finally, we discuss intriguing questions pertaining to continual learning for LLMs (Section 6). The full list of papers examined in this survey is available at this https URL.
近年来,基于静态、预先收集的通用数据集训练的大语言模型(LLMs)的成功引发了大量的研究方向和应用。其中一种方向解决了将预训练的LLM集成到动态数据分布、任务结构和用户偏好中的非平凡挑战。经过专门调整以满足特定需求后,预训练的LLM在先前知识领域中的表现常常会显著下降,这种现象被称为“灾难性遗忘”。尽管在持续学习(CL)领域得到了广泛研究,但LLMs在LLM领域中呈现出了新的表现形式。在本次调查中,我们全面概述了LLM在CL背景下的当前研究进展。本次调查分为四个主要部分:我们首先描述了持续学习LLMs的概述,包括两个方向:垂直连续(或垂直持续学习),即从通用到特定能力的持续适应,以及水平连续(或水平持续学习),即跨越时间和领域的持续适应(第3节)。接着我们总结了在现代CL背景下学习LLM的三个阶段:持续预训练(CPT)、领域自适应预训练(DAP)和持续微调(CFT)(第4节)。然后我们概述了使用LLMs进行持续学习的评估协议以及当前可用的数据源(第5节)。最后,我们讨论了与LLM的持续学习相关的一些有趣问题(第6节)。本次调查中审查的论文清单可以在这个https:// URL中找到。
https://arxiv.org/abs/2404.16789
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.
本文作为AIS 2024实时图像超分辨率(RTSR)挑战的一部分,旨在在商业GPU上实时将压缩图像从540p升级到4K分辨率(4x倍)。为此,我们使用包含各种4K图像的多样化测试集。这些图像使用现代AVIF编码器压缩,而不是JPEG。所有提出的方法都超过了Lanczos插值在PSNR方面的保真度,并能在10ms内处理图像。在160名参与者中,有25支团队提交了其代码和模型。本调查描述了用于实时压缩高分辨率图像的最佳解决方案。
https://arxiv.org/abs/2404.16484
Generative AI (GAI) can enhance the cognitive, reasoning, and planning capabilities of intelligent modules in the Internet of Vehicles (IoV) by synthesizing augmented datasets, completing sensor data, and making sequential decisions. In addition, the mixture of experts (MoE) can enable the distributed and collaborative execution of AI models without performance degradation between connected vehicles. In this survey, we explore the integration of MoE and GAI to enable Artificial General Intelligence in IoV, which can enable the realization of full autonomy for IoV with minimal human supervision and applicability in a wide range of mobility scenarios, including environment monitoring, traffic management, and autonomous driving. In particular, we present the fundamentals of GAI, MoE, and their interplay applications in IoV. Furthermore, we discuss the potential integration of MoE and GAI in IoV, including distributed perception and monitoring, collaborative decision-making and planning, and generative modeling and simulation. Finally, we present several potential research directions for facilitating the integration.
生成式人工智能(GAI)可以通过合成增强数据集、完成传感器数据和做出序列决策来增强智能模块在汽车互联网(IoV)中的认知、推理和规划能力。此外,专家混合(MoE)可以实现分布式和协作执行AI模型,而不会导致性能下降。在本次调查中,我们探讨了将MoE和GAI集成到IoV中以实现汽车通用人工智能(AGI),这将实现无需大量人类监督即可实现IoV的全自动驾驶,并适用于各种移动场景,包括环境监测、交通管理和自动驾驶。特别是,我们概述了GAI、MoE及其相互作用的原理在IoV中的应用。此外,我们讨论了在IoV中实现MoE和GAI协同的可能性,包括分布式感知和监测、协同决策和规划以及生成建模和仿真。最后,我们提出了几个促进整合的研究方向。
https://arxiv.org/abs/2404.16356
The awareness of multi-cultural human values is critical to the ability of language models (LMs) to generate safe and personalized responses. However, this awareness of LMs has been insufficiently studied, since the computer science community lacks access to the large-scale real-world data about multi-cultural values. In this paper, we present WorldValuesBench, a globally diverse, large-scale benchmark dataset for the multi-cultural value prediction task, which requires a model to generate a rating response to a value question based on demographic contexts. Our dataset is derived from an influential social science project, World Values Survey (WVS), that has collected answers to hundreds of value questions (e.g., social, economic, ethical) from 94,728 participants worldwide. We have constructed more than 20 million examples of the type "(demographic attributes, value question) $\rightarrow$ answer" from the WVS responses. We perform a case study using our dataset and show that the task is challenging for strong open and closed-source models. On merely $11.1\%$, $25.0\%$, $72.2\%$, and $75.0\%$ of the questions, Alpaca-7B, Vicuna-7B-v1.5, Mixtral-8x7B-Instruct-v0.1, and GPT-3.5 Turbo can respectively achieve $<0.2$ Wasserstein 1-distance from the human normalized answer distributions. WorldValuesBench opens up new research avenues in studying limitations and opportunities in multi-cultural value awareness of LMs.
意识到多元文化人类价值观对于语言模型(LMs)生成安全和个性化的回应至关重要。然而,对于LMs的多元文化价值观的意识尚缺乏充分的研究,因为计算机科学领域无法访问关于多元文化价值观的大规模现实世界数据。在本文中,我们提出了一个全球多样、大规模的多文化价值预测任务基准数据集WorldValuesBench,该数据集要求基于人口背景生成一个评分回答,基于 demographic contexts。我们的数据来源于一个著名的社会科学项目World Values Survey(WVS),它收集了来自全球94,728个参与者的数百个价值问题的答案(例如社会、经济、伦理)。我们基于WVS的响应构建了超过2000万对类型"人口属性,价值问题$\rightarrow$答案"的示例。我们使用我们的数据集进行案例研究,并证明了对于强开放和封闭源模型来说,这项任务具有挑战性。仅在11.1%、25.0%、72.2%和75.0%的问题上,Alpaca-7B、Vicuna-7B-v1.5、Mixtral-8x7B-Instruct-v0.1和GPT-3.5 Turbo可以分别实现与人类归一化答案分布的<0.2 Wasserstein 1-距离。WorldValuesBench在研究多元文化价值意识LMs的局限性和机会方面打开了新的研究途径。
https://arxiv.org/abs/2404.16308
This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution.
本文回顾了NTIRE 2024 RAW图像超分辨率挑战,重点关注所提出的解决方案和结果。在现代图像信号处理(ISP)流程中,RAW超分辨率的新方法可能至关重要,然而,与RGB领域相比,这个问题并没有被广泛探讨。挑战的目标是将RAW Bayer图像的分辨率提高2倍,考虑到未知的降噪和模糊等损失。在挑战期间,共有230名参与者注册,45名提交了结果。对挑战前五名提交者的性能进行了审查,并提供了一个用于评估RAW图像超分辨率当前状态的指标。
https://arxiv.org/abs/2404.16223
This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed methods must process 30 FHD frames under 1 second. In the challenge, a total of 102 participants registered, and 15 submitted code and models. The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content.
本文回顾了 AIS 2024 视频质量评估(VQA)挑战,重点关注用户生成内容(UGC)。这一挑战的目标是收集基于深度学习的估算 UGC 视频感知质量的方法。来自 YouTube UGC 数据集的用户生成视频包括各种内容(体育、游戏、歌词、动漫等),质量和分辨率。所提出的方法必须在 1 秒内处理 30 FHD 帧。在挑战中,共有 102 名参与者注册,其中 15 名提交了代码和模型。对前五名提交者的性能进行了审查,并提供了一个调查不同深度模型用于有效评估用户生成内容视频质量的调查结果。
https://arxiv.org/abs/2404.16205
This survey analyzes intermediate fusion methods in collaborative perception for autonomous driving, categorized by real-world challenges. We examine various methods, detailing their features and the evaluation metrics they employ. The focus is on addressing challenges like transmission efficiency, localization errors, communication disruptions, and heterogeneity. Moreover, we explore strategies to counter adversarial attacks and defenses, as well as approaches to adapt to domain shifts. The objective is to present an overview of how intermediate fusion methods effectively meet these diverse challenges, highlighting their role in advancing the field of collaborative perception in autonomous driving.
这份调查对自动驾驶中协作感知的中间融合方法进行了分析,按照现实世界的挑战进行分类。我们检查了各种方法,详细介绍了它们的特征以及它们采用的评估指标。重点在于解决诸如传输效率、定位误差、通信干扰和异质性等问题。此外,我们还探讨了应对对抗攻击和防御策略以及适应领域转移的方法。调查的目的是提供一个概述,表明中间融合方法如何有效应对这些多样挑战,突出它们在自动驾驶领域协作感知中的作用。
https://arxiv.org/abs/2404.16139
Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{this https URL}.
序列建模是一个贯穿各种领域的关键领域,包括自然语言处理(NLP)、语音识别、时间序列预测、音乐生成和生物信息学。递归神经网络(RNNs)和长短时记忆网络(LSTMs)历史上曾统治序列建模任务,如机器翻译、命名实体识别等。然而,Transformer的进步导致了一种范式的转移,由于它们在性能上的优越表现。然而,Transformer的注意力复杂性和处理归纳偏差的能力仍然存在挑战。为解决这些问题,已经提出了几种变体,包括使用特征网络或卷积的模型,并在各种任务上表现良好。然而,它们仍然很难处理长序列。状态空间模型(SSMs)在这一背景下出现了有前景的替代方案,尤其是S4和其变体,如S4nd、Hippo、Hyena、诊断状态空间(DSS)、Gated State Spaces(GSS)和Linear Recurrent Unit(LRU)、Liquid-S4、Mamba等。在本次调查中,我们根据三种范式对基本SSMs进行了分类,即开关架构、结构架构和循环架构。本调查还强调了SSMs在各个领域的多样化应用,如视觉、视频、音频、语音、语言(特别是长序列建模)、医学(包括基因组学)、化学(如药物设计)和推荐系统,以及时间序列分析,包括表格数据。此外,我们还分析了SSMs在基准数据集,如Long Range Arena(LRA)、WikiText、Glue、Pile、ImageNet、Kinetics-400、sstv2,以及视频数据集,如Breakfast、COIN、LVU等。Mamba-360工作的项目页面可以在该网页上查看。
https://arxiv.org/abs/2404.16112
Ground Penetrating Radar (GPR) has been widely studied as a tool for extracting soil parameters relevant to agriculture and horticulture. When combined with Machine-Learning-based (ML) methods, high-resolution Stepped Frequency Countinuous Wave Radar (SFCW) measurements hold the promise to give cost effective access to depth resolved soil parameters, including at root-level depth. In a first step in this direction, we perform an extensive field survey with a tractor mounted SFCW GPR instrument. Using ML data processing we test the GPR instrument's capabilities to predict the apparent electrical conductivity (ECaR) as measured by a simultaneously recording Electromagnetic Induction (EMI) instrument. The large-scale field measurement campaign with 3472 co-registered and geo-located GPR and EMI data samples distributed over ~6600 square meters was performed on a golf course. The selected terrain benefits from a high surface homogeneity, but also features the challenge of only small, and hence hard to discern, variations in the measured soil parameter. Based on the quantitative results we suggest the use of nugget-to-sill ratio as a performance metric for the evaluation of end-to-end ML performance in the agricultural setting and discuss the limiting factors in the multi-sensor regression setting. The code is released as open source and available at this https URL.
透地雷达(GPR)作为一种用于提取与农业和园艺相关的土壤参数的工具,已经得到了广泛研究。当与机器学习(ML)方法相结合时,高分辨率逐次频移连续波雷达(SFCW)测量具有将成本有效地获取到深度解析土壤参数(包括根层深度)的潜力。在朝着这个方向迈出的第一步中,我们使用搭载拖拉机上的SFCW GPR仪器进行了一场广泛的现场调查。利用ML数据处理,我们测试了GPR仪器的功能,以预测同时记录电磁感应(EMI)仪器测量的表面电导率(ECaR)。在一片高尔夫球场地上进行的这项大规模现场测量活动使用了3472个共同注册和几何定位的GPR和EMI数据样本,覆盖约6600平方米。所选地面得益于高表面均匀性,但同时也存在测量土壤参数小且难以分辨的挑战。根据定量结果,我们建议将泥炭到胎里的比率作为衡量整个ML性能的性能指标,并讨论了多传感器回归设置中的限制因素。该代码是开源的,您可以在此链接https上获取。
https://arxiv.org/abs/2404.15961
State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey aiming to provide an in-depth analysis of Mamba models in the field of computer vision. It begins by exploring the foundational concepts contributing to Mamba's success, including the state space model framework, selection mechanisms, and hardware-aware design. Next, we review these vision mamba models by categorizing them into foundational ones and enhancing them with techniques such as convolution, recurrence, and attention to improve their sophistication. We further delve into the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, Medical visual tasks (e.g., 2D / 3D segmentation, classification, and image registration, etc.), and Remote Sensing visual tasks. We specially introduce general visual tasks from two levels: High/Mid-level vision (e.g., Object detection, Segmentation, Video classification, etc.) and Low-level vision (e.g., Image super-resolution, Image restoration, Visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision.
带有选择机制和硬件感知架构的状态空间模型(SSMs),如Mamba,在长序列建模方面最近取得了显著的进展。由于Transformer中自注意力机制的复杂性随着图像尺寸的增加而增加,计算机视觉任务的计算需求也在增加,因此研究人员现在正在探索如何将Mamba适应计算机视觉任务。本文是旨在为计算机视觉领域提供对Mamba模型的深入分析的第一篇全面调查。文章首先探讨了导致Mamba成功的基本概念,包括状态空间模型框架、选择机制和硬件感知设计。接下来,我们通过分类这些视觉Mamba模型为基本模型并使用卷积、递归和注意等技术对其进行改进,来回顾这些模型。我们深入探讨了Mamba在计算机视觉任务中的广泛应用,包括在各种级别视觉处理中的作为骨干的应用。这包括一般视觉任务(如物体检测、分割、分类和图像配准等)、医学视觉任务(如2D/3D分割、分类和图像配准等)和遥感视觉任务。我们特别引入了两个层面的通用视觉任务:高/中级别视觉(如物体检测、分割、视频分类等)和低级别视觉(如图像超分辨率、图像修复、视觉生成等)。我们希望这个努力将在社区中激发更多的兴趣,以解决当前的挑战并进一步将Mamba模型应用于计算机视觉。
https://arxiv.org/abs/2404.15956
Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field, highlighting the necessity for specialized evaluation frameworks to ensure their effective and ethical deployment. This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare, emphasizing the critical need for empirical validation to fully exploit their capabilities in enhancing healthcare outcomes. Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness. We begin by exploring the roles of LLMs in different medical applications, detailing how they are evaluated based on their performance in tasks such as clinical application, medical text data processing, information retrieval, data analysis, medical scientific writing, educational content generation etc. The subsequent sections delve into the methodologies employed in these evaluations, discussing the benchmarks and metrics used to assess the models' effectiveness, accuracy, and ethical alignment. Through this survey, we aim to equip healthcare professionals, researchers, and policymakers with a comprehensive understanding of the potential strengths and limitations of LLMs in medical applications. By providing detailed insights into the evaluation processes and the challenges faced in integrating LLMs into healthcare, this survey seeks to guide the responsible development and deployment of these powerful models, ensuring they are harnessed to their full potential while maintaining stringent ethical standards.
自2017年Transformer架构的创立以来,大型语言模型(LLMs)如GPT和BERT等在语言理解和生成方面的先进能力显著发展,对 various行业产生了重大影响。这些模型展示出在医疗领域进行变革的潜力,突显了需要专业评估框架以确保其有效和道德部署的必要性。这次全面调查详细探讨了LLMs在医疗保健领域中的应用和评估需求,强调了对这些模型的实证验证以全面发挥其在提高医疗保健成果方面的关键作用的重要性。我们的调查旨在为医疗保健专业人员、研究人员和政策制定者提供全面了解LLM在医疗应用中的潜力和限制的全面理解。通过提供关于这些评估过程和将LLMs整合到医疗保健中的挑战的详细见解,这次调查旨在指导这些强大模型的 responsible development 和 deployment,确保它们在保持严格道德标准的同时充分发挥其全部潜力。
https://arxiv.org/abs/2404.15777
Chain-of-Thought (CoT) has been a widely adopted prompting method, eliciting impressive reasoning abilities of Large Language Models (LLMs). Inspired by the sequential thought structure of CoT, a number of Chain-of-X (CoX) methods have been developed to address various challenges across diverse domains and tasks involving LLMs. In this paper, we provide a comprehensive survey of Chain-of-X methods for LLMs in different contexts. Specifically, we categorize them by taxonomies of nodes, i.e., the X in CoX, and application tasks. We also discuss the findings and implications of existing CoX methods, as well as potential future directions. Our survey aims to serve as a detailed and up-to-date resource for researchers seeking to apply the idea of CoT to broader scenarios.
链式思考(CoT)是一种广泛采用的提示方法,激发了大型语言模型(LLMs)惊人的推理能力。受到CoT序列思维结构的启发,许多链式思考(CoX)方法为处理各种涉及LLM的多样领域和任务的挑战提供了方法。在本文中,我们对不同上下文中的链式思考(CoX)方法进行了全面的调查。具体来说,我们将它们按照节点分类法进行分类,即CoX中的X,以及应用任务。我们还讨论了现有CoX方法的发现和影响,以及潜在的未来方向。我们的调查旨在为研究人员提供详细的、最新的资源,以便他们把链式思考(CoT)应用到更广泛的场景中。
https://arxiv.org/abs/2404.15676
The rapidly changing architecture and functionality of electrical networks and the increasing penetration of renewable and distributed energy resources have resulted in various technological and managerial challenges. These have rendered traditional centralized energy-market paradigms insufficient due to their inability to support the dynamic and evolving nature of the network. This survey explores how multi-agent reinforcement learning (MARL) can support the decentralization and decarbonization of energy networks and mitigate the 12 associated challenges. This is achieved by specifying key computational challenges in managing energy networks, reviewing recent research progress on addressing them, and highlighting open challenges that may be addressed using MARL.
由于电力网络 rapidly变化的建筑和功能,以及可再生能源和分布式能源资源的日益普及,产生了各种技术和管理挑战。这使得传统集中式能源市场范式由于无法支持网络的动态和演变性质而变得不足。本调查探讨了多智能体强化学习(MARL)如何支持能源网络的分散化和脱碳,并减轻与12个相关挑战相关的负担。这是通过指定管理能源网络的关键计算挑战,回顾针对这些挑战的最近研究进展,并强调可以使用MARL解决的开放挑战来实现的。
https://arxiv.org/abs/2404.15583
We develop a deep neural network (DNN) to obtain photometry of saturated stars in the All-Sky Automated Survey for Supernovae (ASAS-SN). The DNN can obtain unbiased photometry for stars from g=4 to 14 mag with a dispersion (15%-85% 1sigma range around median) of 0.12 mag for saturated (g<11.5 mag) stars. More importantly, the light curve of a non-variable saturated star has a median dispersion of only 0.037 mag. The DNN light curves are, in many cases, spectacularly better than provided by the standard ASAS-SN pipelines. While the network was trained on g band data from only one of ASAS-SN's 20 cameras, initial experiments suggest that it can be used for any camera and the older ASAS-SN V band data as well. The dominant problems seem to be associated with correctable issues in the ASAS-SN data reduction pipeline for saturated stars more than the DNN itself. The method is publicly available as a light curve option on ASAS-SN Sky Patrol v1.0.
我们开发了一个深度神经网络(DNN)用于获取全星自动 survey for Supernovae (ASAS-SN) 中饱和恒星的光度测量值。该 DNN 可以获得从 g=4 到 14 mag 的恒星的不带偏差光度测量值,光度(在均值附近,15%-85% 的 1σ 范围)为 0.12 mag。更重要的是,非变星的饱和恒星的光曲线具有仅 0.037 mag 的均方差。DNN 光曲线在许多情况下比 ASAS-SN 标准管道提供的更令人印象深刻。虽然该网络仅在 ASAS-SN 的 20 个相机上训练,但初始实验结果表明,它可以用于任何相机,甚至是更旧的 ASAS-SN V 带数据。主导问题似乎与 ASAS-SN 数据 reduction 管道中饱和恒星的数据有关,而不是与 DNN 本身有关。该方法作为 ASAS-SN Sky Patrol v1.0 上的光度选项是公开的。
https://arxiv.org/abs/2404.15405
This paper focuses on a very important societal challenge of water quality analysis. Being one of the key factors in the economic and social development of society, the provision of water and ensuring its quality has always remained one of the top priorities of public authorities. To ensure the quality of water, different methods for monitoring and assessing the water networks, such as offline and online surveys, are used. However, these surveys have several limitations, such as the limited number of participants and low frequency due to the labor involved in conducting such surveys. In this paper, we propose a Natural Language Processing (NLP) framework to automatically collect and analyze water-related posts from social media for data-driven decisions. The proposed framework is composed of two components, namely (i) text classification, and (ii) topic modeling. For text classification, we propose a merit-fusion-based framework incorporating several Large Language Models (LLMs) where different weight selection and optimization methods are employed to assign weights to the LLMs. In topic modeling, we employed the BERTopic library to discover the hidden topic patterns in the water-related tweets. We also analyzed relevant tweets originating from different regions and countries to explore global, regional, and country-specific issues and water-related concerns. We also collected and manually annotated a large-scale dataset, which is expected to facilitate future research on the topic.
本论文重点关注水质量分析这一重要的社会挑战。作为社会经济发展的重要因素,提供水资源并确保其质量始终是公共当局的头等大事。为了确保水质,采用了一些监测和评估水网络的方法,例如离线和在线调查。然而,这些调查存在一些局限性,例如参与人数有限和调查频率较低,因为这些调查需要大量的人力投入。在本文中,我们提出了一个自然语言处理(NLP)框架,用于自动收集和分析社交媒体上的与水相关的帖子,以支持数据驱动的决策。所提出的框架由两个组成部分组成,即(i)文本分类和(ii)主题建模。 在文本分类方面,我们提出了一个基于 merits-fusion 的框架,其中包含多个 large language models (LLMs)。为了给 LLMs 分配权重,我们采用了一些权重选择和优化方法。在主题建模方面,我们使用了 BERTopic 库来发现水相关微博中的隐藏主题模式。我们还分析了许多不同地区和国家的相关 tweets,以探索全球、区域和国家特定的水和相关问题。 我们还收集并手动标注了一个大规模数据集,预计将促进未来关于这个主题的研究。
https://arxiv.org/abs/2404.14977
Hyperspectral image classification is a challenging task due to the high dimensionality and complex nature of hyperspectral data. In recent years, deep learning techniques have emerged as powerful tools for addressing these challenges. This survey provides a comprehensive overview of the current trends and future prospects in hyperspectral image classification, focusing on the advancements from deep learning models to the emerging use of transformers. We review the key concepts, methodologies, and state-of-the-art approaches in deep learning for hyperspectral image classification. Additionally, we discuss the potential of transformer-based models in this field and highlight the advantages and challenges associated with these approaches. Comprehensive experimental results have been undertaken using three Hyperspectral datasets to verify the efficacy of various conventional deep-learning models and Transformers. Finally, we outline future research directions and potential applications that can further enhance the accuracy and efficiency of hyperspectral image classification. The Source code is available at this https URL.
超分辨率图像分类是一个具有高维度和复杂性的挑战性的任务。近年来,深度学习技术已成为解决这些挑战的强大工具。本文对当前 hyperspectral 图像分类的趋势和未来前景进行全面概述,重点关注从深度学习模型到新兴的 transformer 的应用。我们回顾了用于 hyperspectral 图像分类的深度学习中的关键概念、方法和最先进的策略。此外,我们讨论了基于 transformer 的模型的潜力,并强调了这些方法的优势和挑战。使用三个超分辨率数据集进行了全面实验,以验证各种传统深度学习模型和 transformer 的有效性。最后,我们概述了未来研究的方向和可能的应用,以进一步增强超分辨率图像分类的准确性和效率。源代码可在此处访问:https://www.example.com/。
https://arxiv.org/abs/2404.14955
Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML's generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph heterogeneity and out-of-distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field.
图在表示复杂关系方面在社交网络、知识图谱和分子发现等领域中发挥着重要作用。随着深度学习的出现,图神经网络(GNNs)成为图机器学习(Graph ML)的一个支柱,推动了图结构的表示和处理。近年来,LLM在语言任务上的表现已经达到了史无前例的水平,并在各种应用领域(如计算机视觉和推荐系统)得到了广泛应用。这一显著的成功也引起了将LLM应用于图形领域的兴趣。越来越多的努力致力于探索LLM在推动图机器学习的一般化、可迁移性和少样本学习能力方面的潜力。同时,特别是知识图谱,图形在可靠的事实知识方面非常丰富,可以利用来增强LLM的推理能力,并可能减轻其局限性,如幻觉和缺乏可解释性。鉴于这一研究领域的快速进步,对于LLM时代图机器学习的系统综述总结最新的进展是必要的,以提供研究人员和实践者对这一领域的深入理解。因此,在本次调查中,我们首先回顾了图机器学习领域的最新发展。然后,我们探讨了LLM如何用于提高图形特征的质量、减轻对标注数据的依赖以及解决诸如图形异质性和离散(OOD)泛化等问题。接着,我们深入研究了图形如何增强LLM,强调了它们在提高LLM预训练和推理能力方面的能力。最后,我们调查了各种应用,并讨论了这一充满前景的领域未来的潜在方向。
https://arxiv.org/abs/2404.14928
Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.
大语言模型(LLMs)在学术界和公众中常被讨论为支持文本生成几乎所有应用场景的工具,包括软件工程。目前有很多关于LLM工具的辩论,但很少有实证证据,关于这些工具在工业界工程师中的应用效果。我们对24名职业软件工程师在工作的一个星期内使用ChatGPT进行观察研究,并对其与聊天机器的对话以及整体经验(通过退出调查进行捕捉)进行定性分析。我们发现,实践者更倾向于使用ChatGPT获得有关任务解决方案的指导,而不是期望该工具生成可用的软件输出(例如,代码)。我们也提出了一个理论框架,即(i)交互的目的,(ii)内部因素(例如用户的个性)和(iii)外部因素(例如公司政策)共同塑造了体验(以感知有用性和信任为基础)。我们展望,我们的框架可以为未来的研究提供一个进一步探讨LLM在软件工程师中的应用、为该领域未来实证LLM研究的指导,以及作为未来研究的一个参考点的框架。
https://arxiv.org/abs/2404.14901
With the increasingly giant scales of (causal) large language models (LLMs), the inference efficiency comes as one of the core concerns along the improved performance. In contrast to the memory footprint, the latency bottleneck seems to be of greater importance as there can be billions of requests to a LLM (e.g., GPT-4) per day. The bottleneck is mainly due to the autoregressive innateness of LLMs, where tokens can only be generated sequentially during decoding. To alleviate the bottleneck, the idea of speculative execution, which originates from the field of computer architecture, is introduced to LLM decoding in a \textit{draft-then-verify} style. Under this regime, a sequence of tokens will be drafted in a fast pace by utilizing some heuristics, and then the tokens shall be verified in parallel by the LLM. As the costly sequential inference is parallelized, LLM decoding speed can be significantly boosted. Driven by the success of LLMs in recent couple of years, a growing literature in this direction has emerged. Yet, there lacks a position survey to summarize the current landscape and draw a roadmap for future development of this promising area. To meet this demand, we present the very first survey paper that reviews and unifies literature of speculative execution in LLMs (e.g., blockwise parallel decoding, speculative decoding, etc.) in a comprehensive framework and a systematic taxonomy. Based on the taxonomy, we present a critical review and comparative analysis of the current arts. Finally we highlight various key challenges and future directions to further develop the area.
随着大型语言模型(LLMs)越来越大,提高性能的核心问题之一是推理效率。相比之下,内存开销似乎不太重要,因为每天可能有数十亿个请求到LLM(例如GPT-4)。瓶颈主要源于LLMs的自回归性质,其中在解码过程中只能按顺序生成标记。为了减轻瓶颈,借鉴计算机架构领域的思想,以“草案-验证”的方式引入了LLM解码中的speculative execution。在这种模式下,通过使用一些启发式方法,可以快速生成一系列标记,然后由LLM并行验证这些标记。随着成本sequential inference的并行化,LLM解码速度可以大幅提高。 在LLM在过去几年取得成功的情况下,这一方向出现了越来越多的文献。然而,目前尚缺乏一份全面的调查报告,总结当前格局并为未来这个有前景的领域的发展路线图。为了满足这一需求,我们提出了第一篇 survey 论文,它回顾和统一了LLMs中speculative execution(例如块式并行解码,speculative decoding等)的文獻,并建立了一个全面的框架和系统分类学。根据这一分类学,我们给出了对当前艺术的关键审查和比较分析。最后,我们强调了进一步发展和该领域的各种关键挑战和未来方向。
https://arxiv.org/abs/2404.14897
The integration of Foundation Models (FMs) with Federated Learning (FL) presents a transformative paradigm in Artificial Intelligence (AI), offering enhanced capabilities while addressing concerns of privacy, data decentralization, and computational efficiency. This paper provides a comprehensive survey of the emerging field of Federated Foundation Models (FedFM), elucidating their synergistic relationship and exploring novel methodologies, challenges, and future directions that the FL research field needs to focus on in order to thrive in the age of foundation models. A systematic multi-tiered taxonomy is proposed, categorizing existing FedFM approaches for model training, aggregation, trustworthiness, and incentivization. Key challenges, including how to enable FL to deal with high complexity of computational demands, privacy considerations, contribution evaluation, and communication efficiency, are thoroughly discussed. Moreover, the paper explores the intricate challenges of communication, scalability and security inherent in training/fine-tuning FMs via FL, highlighting the potential of quantum computing to revolutionize the training, inference, optimization and data encryption processes. This survey underscores the importance of further research to propel innovation in FedFM, emphasizing the need for developing trustworthy solutions. It serves as a foundational guide for researchers and practitioners interested in contributing to this interdisciplinary and rapidly advancing field.
将基础模型(FMs)与去中心化学习(FL)相结合,为人工智能(AI)领域提供了一种变革性的范式,同时解决了对隐私、数据去中心化和计算效率的担忧。本文对新兴领域Federated Foundation Models(FedFM)进行全面调查,阐述其协同关系,并探讨了FL研究领域需要关注的新方法、挑战和未来方向,以便在基础模型时代蓬勃发展。提出了一种多层分类体系,对现有的FedFM模型进行分类,包括模型训练、聚合、可信度和激励。详细讨论了如何通过FL处理计算需求的复杂性、隐私考虑、贡献评估和通信效率等关键挑战。此外,本文探讨了通过FL训练/微调FMs所面临的精细挑战,强调了量子计算在革新训练、推理、优化和数据加密过程中的潜在可能性。这项调查强调了对FedFM进一步研究以推动创新的必要性,需要开发可靠的解决方案。它为对此跨学科且快速发展的领域感兴趣的研究人员和实践者提供了一个基础指南。
https://arxiv.org/abs/2404.15381