Human understanding and generation are critical for modeling digital humans and humanoid embodiments. Recently, Human-centric Foundation Models (HcFMs) inspired by the success of generalist models, such as large language and vision models, have emerged to unify diverse human-centric tasks into a single framework, surpassing traditional task-specific approaches. In this survey, we present a comprehensive overview of HcFMs by proposing a taxonomy that categorizes current approaches into four groups: (1) Human-centric Perception Foundation Models that capture fine-grained features for multi-modal 2D and 3D understanding. (2) Human-centric AIGC Foundation Models that generate high-fidelity, diverse human-related content. (3) Unified Perception and Generation Models that integrate these capabilities to enhance both human understanding and synthesis. (4) Human-centric Agentic Foundation Models that extend beyond perception and generation to learn human-like intelligence and interactive behaviors for humanoid embodied tasks. We review state-of-the-art techniques, discuss emerging challenges and future research directions. This survey aims to serve as a roadmap for researchers and practitioners working towards more robust, versatile, and intelligent digital human and embodiments modeling.
人类的理解和生成对于数字人及仿人模型的构建至关重要。最近,受大型语言和视觉模型等通用模型成功的启发,以人类为中心的基础模型(HcFMs)兴起并致力于将各种以人为中心的任务整合到一个统一框架中,从而超越了传统的特定任务方法。在这篇综述中,我们提出了一种分类法,通过将其当前的方法分为四个类别来全面概述HcFMs:(1) 以人类为中心的感知基础模型,捕捉多模态2D和3D理解中的细微特征;(2) 以人为中心的人工智能生成(AIGC)基础模型,能够生成高保真度、多样化的人类相关内容;(3) 统一感知与生成模型,整合这些能力以增强人类理解和合成;以及 (4) 以人类为中心的代理基础模型,超越感知和生成,学习类似人的智慧及用于仿人任务中的交互行为。我们回顾了最新的技术,并讨论了新兴挑战和未来的研究方向。该综述旨在为致力于更稳健、多样化且智能的数字人和仿生体建模的研究人员和实践者提供路线图。
https://arxiv.org/abs/2502.08556
Image quality assessment (IQA) represents a pivotal challenge in image-focused technologies, significantly influencing the advancement trajectory of image processing and computer vision. Recently, IQA has witnessed a notable surge in innovative research efforts, driven by the emergence of novel architectural paradigms and sophisticated computational techniques. This survey delivers an extensive analysis of contemporary IQA methodologies, organized according to their application scenarios, serving as a beneficial reference for both beginners and experienced researchers. We analyze the advantages and limitations of current approaches and suggest potential future research pathways. The survey encompasses both general and specific IQA methodologies, including conventional statistical measures, machine learning techniques, and cutting-edge deep learning models such as convolutional neural networks (CNNs) and Transformer models. The analysis within this survey highlights the necessity for distortion-specific IQA methods tailored to various application scenarios, emphasizing the significance of practicality, interpretability, and ease of implementation in future developments.
图像质量评估(IQA)在以图像为中心的技术中代表了一个关键的挑战,对图像处理和计算机视觉的发展路径有着重要影响。近年来,由于新型架构范式和复杂计算技术的出现,IQA领域见证了创新研究工作的显著增长。本次综述提供了当代IQA方法学的全面分析,并按照其应用场景进行组织,为初学者和有经验的研究人员提供有益的参考资源。我们评估了现有方法的优势与局限性,并提出未来潜在的研究方向建议。此次调查涵盖了通用及特定场景下的图像质量评估技术,包括传统的统计指标、机器学习技术以及前沿深度学习模型(如卷积神经网络(CNN)和Transformer模型)。综述中的分析强调了针对不同应用场景的失真特异性IQA方法的重要性,并突出了未来发展的实用性和可解释性等方面的必要要求。
https://arxiv.org/abs/2502.08540
Handwritten Text Recognition (HTR) has become an essential field within pattern recognition and machine learning, with applications spanning historical document preservation to modern data entry and accessibility solutions. The complexity of HTR lies in the high variability of handwriting, which makes it challenging to develop robust recognition systems. This survey examines the evolution of HTR models, tracing their progression from early heuristic-based approaches to contemporary state-of-the-art neural models, which leverage deep learning techniques. The scope of the field has also expanded, with models initially capable of recognizing only word-level content progressing to recent end-to-end document-level approaches. Our paper categorizes existing work into two primary levels of recognition: (1) \emph{up to line-level}, encompassing word and line recognition, and (2) \emph{beyond line-level}, addressing paragraph- and document-level challenges. We provide a unified framework that examines research methodologies, recent advances in benchmarking, key datasets in the field, and a discussion of the results reported in the literature. Finally, we identify pressing research challenges and outline promising future directions, aiming to equip researchers and practitioners with a roadmap for advancing the field.
手写文本识别(HTR)已成为模式识别和机器学习领域的一个重要分支,其应用范围从历史文档的保存到现代数据录入及无障碍解决方案。HTR 的复杂性在于手写风格的高度变异性,这使得开发稳健的识别系统具有挑战性。本文综述了 HTR 模型的发展历程,追溯其从早期基于启发式的方法演进至当前最先进的神经网络模型的过程,后者利用深度学习技术来提升性能。该领域的研究范围也已扩大,初期只能识别单词级内容的模型逐渐发展为现今涵盖整个文档级别的端到端方法。我们在论文中将现有的工作分类为两个主要的识别层次:(1)**线级别及以下**,包括词和行的识别;以及 (2)**超出行级别**,解决段落和整篇文档层面的问题。我们提供了一个统一的研究框架,涵盖了研究方法、近期基准测试的进步、领域中的关键数据集,以及对文献中报告结果的讨论。最后,我们指出了亟待解决的研究挑战,并概述了未来有前景的发展方向,旨在为研究人员及从业者提供一份推进该领域的路线图。
https://arxiv.org/abs/2502.08417
Data scarcity significantly complicates the continual learning problem, i.e., how a deep neural network learns in dynamic environments with very few samples. However, the latest progress of few-shot class incremental learning (FSCIL) methods and related studies show insightful knowledge on how to tackle the problem. This paper presents a comprehensive survey on FSCIL that highlights several important aspects i.e. comprehensive and formal objectives of FSCIL approaches, the importance of prototype rectifications, the new learning paradigms based on pre-trained model and language-guided mechanism, the deeper analysis of FSCIL performance metrics and evaluation, and the practical contexts of FSCIL in various areas. Our extensive discussion presents the open challenges, potential solutions, and future directions of FSCIL.
数据稀缺显著增加了持续学习问题的复杂性,即深度神经网络在样本数量非常有限且环境动态变化的情况下如何进行学习。然而,最近关于少量示例类别增量学习(FSCIL)方法及其相关研究的进步提供了一些有价值的见解,以解决此类问题。本文对FSCIL进行了全面回顾,并强调了几个重要方面:FSCIL方法的综合和正式目标、原型修正的重要性、基于预训练模型和语言引导机制的新学习范式、对FSCIL性能指标及评估进行更深入分析以及在各个领域中FSCIL的实际应用背景。我们广泛的讨论揭示了FSCIL面临的开放性挑战,潜在解决方案及其未来的发展方向。
https://arxiv.org/abs/2502.08181
Visual contrastive learning aims to learn representations by contrasting similar (positive) and dissimilar (negative) pairs of data samples. The design of these pairs significantly impacts representation quality, training efficiency, and computational cost. A well-curated set of pairs leads to stronger representations and faster convergence. As contrastive pre-training sees wider adoption for solving downstream tasks, data curation becomes essential for optimizing its effectiveness. In this survey, we attempt to create a taxonomy of existing techniques for positive and negative pair curation in contrastive learning, and describe them in detail.
视觉对比学习旨在通过比较相似的(正样本)和不相似的(负样本)数据对来学习表示。这些配对的设计在很大程度上影响了表示质量、训练效率和计算成本。精心策划的一组配对可以产生更强的表示能力并加速收敛过程。随着对比预训练在解决下游任务中得到更广泛的应用,数据整理对于优化其有效性变得至关重要。在这篇综述中,我们尝试为现有对比学习中的正样本与负样本选择技术建立一个分类体系,并详细描述这些方法。
https://arxiv.org/abs/2502.08134
A large number of studies rely on closed-style multiple-choice surveys to evaluate cultural alignment in Large Language Models (LLMs). In this work, we challenge this constrained evaluation paradigm and explore more realistic, unconstrained approaches. Using the World Values Survey (WVS) and Hofstede Cultural Dimensions as case studies, we demonstrate that LLMs exhibit stronger cultural alignment in less constrained settings, where responses are not forced. Additionally, we show that even minor changes, such as reordering survey choices, lead to inconsistent outputs, exposing the limitations of closed-style evaluations. Our findings advocate for more robust and flexible evaluation frameworks that focus on specific cultural proxies, encouraging more nuanced and accurate assessments of cultural alignment in LLMs.
大量的研究依赖于封闭式多项选择问卷来评估大型语言模型(LLMs)的文化一致性。在这项工作中,我们挑战这种受限的评价范式,并探索更为现实、非限制性的方法。通过使用世界价值观调查(WVS)和霍夫斯泰德文化维度作为案例研究,我们展示了在响应不被强制的情况下,LLM们在较少受约束的情境中表现出更强的文化一致性。此外,我们还证明即使是微小的变化,如重新排序问卷选项,也会导致输出结果的不一致,揭示了封闭式评估方法的局限性。我们的发现呼吁建立更稳健、灵活的评价框架,专注于特定的文化代理指标,从而促进对LLMs文化一致性的更为细致和准确的评估。
https://arxiv.org/abs/2502.08045
Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains through training on web-scale data. It remains challenging to precisely characterize even a fraction of the full spectrum of capabilities and potential risks in any new model. Existing evaluation approaches often require significant human effort, and it is taking increasing effort to design ever harder challenges for more capable models. We introduce Automated Capability Discovery (ACD), a framework that designates one foundation model as a scientist to systematically propose open-ended tasks probing the abilities of a subject model (potentially itself). By combining frontier models with ideas from the field of open-endedness, ACD automatically and systematically uncovers both surprising capabilities and failures in the subject model. We demonstrate ACD across a range of foundation models (including the GPT, Claude, and Llama series), showing that it automatically reveals thousands of capabilities that would be challenging for any single team to uncover. We further validate our method's automated scoring with extensive human surveys, observing high agreement between model-generated and human evaluations. By leveraging foundation models' ability to both create tasks and self-evaluate, ACD is a significant step toward scalable, automated evaluation of novel AI systems. All code and evaluation logs are open-sourced at this https URL.
基础模型已经成为通用型助手,通过在大规模网络数据上进行训练,在众多领域展示出多样化的功能。然而,准确地描述任何新模型的功能范围和潜在风险依然具有挑战性。现有的评估方法往往需要大量的人力投入,并且设计更加复杂的问题来应对更强大的模型也越来越困难。我们引入了自动化能力发现(ACD)框架,该框架指定一个基础模型作为科学家,系统化地为受测模型(可能包括自身)提出开放式任务以探究其能力。通过将前沿模型与开放性领域的理念相结合,ACD能够自动且系统性地揭示受测模型中令人惊讶的能力和缺陷。 我们在各种基础模型(包括GPT、Claude和Llama系列)上展示了ACD的应用,并证明它可以自动发现数千种任何单一团队难以识别的功能。我们进一步通过广泛的人员调查验证了我们的自动化评分方法,观察到机器生成的评估与人类评估之间有高度的一致性。通过利用基础模型创建任务和自我评价的能力,ACD在新型AI系统的可扩展、自动化评估方面迈出了重要一步。 所有代码和评估日志均以开源形式提供在此URL上(请根据原文中提供的实际链接插入)。
https://arxiv.org/abs/2502.07577
Vision Large Language Models (VLMs) combine visual understanding with natural language processing, enabling tasks like image captioning, visual question answering, and video analysis. While VLMs show impressive capabilities across domains such as autonomous vehicles, smart surveillance, and healthcare, their deployment on resource-constrained edge devices remains challenging due to processing power, memory, and energy limitations. This survey explores recent advancements in optimizing VLMs for edge environments, focusing on model compression techniques, including pruning, quantization, knowledge distillation, and specialized hardware solutions that enhance efficiency. We provide a detailed discussion of efficient training and fine-tuning methods, edge deployment challenges, and privacy considerations. Additionally, we discuss the diverse applications of lightweight VLMs across healthcare, environmental monitoring, and autonomous systems, illustrating their growing impact. By highlighting key design strategies, current challenges, and offering recommendations for future directions, this survey aims to inspire further research into the practical deployment of VLMs, ultimately making advanced AI accessible in resource-limited settings.
视觉大型语言模型(VLM)结合了视觉理解和自然语言处理,支持诸如图像描述、视觉问答和视频分析等任务。尽管VLM在自动驾驶汽车、智能监控以及医疗保健等领域展示了令人印象深刻的能力,但由于计算能力、内存和能源限制,在资源受限的边缘设备上部署它们仍然面临挑战。本综述探讨了优化VLM以适应边缘环境的最近进展,重点介绍了模型压缩技术(如剪枝、量化、知识蒸馏)和专为提高效率而设计的硬件解决方案。我们详细讨论了高效的训练和微调方法、边缘部署挑战以及隐私考虑因素。此外,还探讨了轻量级VLM在医疗保健、环境监测及自主系统等领域的多样应用,展示了它们不断增长的影响。通过强调关键的设计策略、当前面临的挑战,并为未来的发展方向提供建议,本综述旨在激发进一步的研究,使先进的AI技术能够在资源受限的环境中实现实际部署,最终让更多人受益于高级人工智能的能力。
https://arxiv.org/abs/2502.07855
The Transiting Exoplanet Survey Satellite (TESS) is surveying a large fraction of the sky, generating a vast database of photometric time series data that requires thorough analysis to identify exoplanetary transit signals. Automated learning approaches have been successfully applied to identify transit signals. However, most existing methods focus on the classification and validation of candidates, while few efforts have explored new techniques for the search of candidates. To search for new exoplanet transit candidates, we propose an approach to identify exoplanet transit signals without the need for phase folding or assuming periodicity in the transit signals, such as those observed in multi-transit light curves. To achieve this, we implement a new neural network inspired by Transformers to directly process Full Frame Image (FFI) light curves to detect exoplanet transits. Transformers, originally developed for natural language processing, have recently demonstrated significant success in capturing long-range dependencies compared to previous approaches focused on sequential data. This ability allows us to employ multi-head self-attention to identify exoplanet transit signals directly from the complete light curves, combined with background and centroid time series, without requiring prior transit parameters. The network is trained to learn characteristics of the transit signal, like the dip shape, which helps distinguish planetary transits from other variability sources. Our model successfully identified 214 new planetary system candidates, including 122 multi-transit light curves, 88 single-transit and 4 multi-planet systems from TESS sectors 1-26 with a radius > 0.27 $R_{\mathrm{Jupiter}}$, demonstrating its ability to detect transits regardless of their periodicity.
凌日系外行星巡天卫星(TESS)正在对大片天空进行观测,生成了大量的光度时间序列数据,这些数据需要深入分析以识别系外行星的凌日信号。自动学习方法已经被成功地应用于识别凌日信号。然而,大多数现有的方法主要集中在候选体的分类和验证上,而很少有研究探索用于寻找候选体的新技术。 为了搜索新的系外行星凌日候选者,我们提出了一种无需相位折叠或假设凌日信号具有周期性的新方法,例如在多凌日光变曲线中观察到的情况。为实现这一目标,我们实施了一个受Transformer启发的新型神经网络来直接处理全帧图像(FFI)光变曲线以检测系外行星凌日现象。 Transformer最初是为自然语言处理开发的,在捕捉长距离依赖性方面最近显示出比专注于顺序数据的方法更大的成功。这种能力使我们能够使用多头自注意力机制,直接从完整的光变曲线上结合背景和质心时间序列来识别系外行星凌日信号,并且不需要预先设定凌日参数。 该网络被训练以学习凌日信号的特征,如凹陷形状,这有助于将行星凌日与其他可变性来源区分开来。我们的模型成功地从TESS第1至26个区域中识别出214个新的系外行星系统候选者,其中包括122个多凌日光变曲线、88个单凌日和4个多行星系统,并且这些系统的半径大于0.27 $R_{\mathrm{Jupiter}}$,证明了该模型能够在不考虑信号周期性的情况下检测到凌日现象。
https://arxiv.org/abs/2502.07542
Greenwashing is an effort to mislead the public about the environmental impact of an entity, such as a state or company. We provide a comprehensive survey of the scientific literature addressing natural language processing methods to identify potentially misleading climate-related corporate communications, indicative of greenwashing. We break the detection of greenwashing into intermediate tasks, and review the state-of-the-art approaches for each of them. We discuss datasets, methods, and results, as well as limitations and open challenges. We also provide an overview of how far the field has come as a whole, and point out future research directions.
绿色漂洗是指误导公众关于某个实体(如国家或公司)环境影响的行为。我们对科学文献进行了全面回顾,这些文献探讨了使用自然语言处理方法来识别可能具有误导性的与气候相关的公司通讯,以发现其是否涉及绿色漂洗的问题。我们将检测绿色漂洗的过程分解为若干中间任务,并审查了每项任务的最新研究进展和方法。此外,我们讨论了数据集、方法以及研究成果,并指出了存在的限制和开放性挑战。还概述了整个领域迄今为止的发展状况,并提出了未来的研究方向。
https://arxiv.org/abs/2502.07541
Ensuring fairness in decentralized multi-agent systems presents significant challenges due to emergent biases, systemic inefficiencies, and conflicting agent incentives. This paper provides a comprehensive survey of fairness in multi-agent AI, introducing a novel framework where fairness is treated as a dynamic, emergent property of agent interactions. The framework integrates fairness constraints, bias mitigation strategies, and incentive mechanisms to align autonomous agent behaviors with societal values while balancing efficiency and robustness. Through empirical validation, we demonstrate that incorporating fairness constraints results in more equitable decision-making. This work bridges the gap between AI ethics and system design, offering a foundation for accountable, transparent, and socially responsible multi-agent AI systems.
确保去中心化多智能体系统中的公平性面临着重大挑战,这些挑战源于新兴偏见、系统效率低下以及代理激励之间的冲突。本文提供了对多智能体人工智能中公平性的全面调查,并引入了一个新颖的框架,在该框架下,公平被视为代理交互过程中产生的动态和涌现特性。该框架整合了公平约束、偏见缓解策略及激励机制,以使自主代理行为与社会价值观保持一致,同时平衡效率和稳健性。通过实证验证,我们展示了在决策制定中加入公平约束可以导致更加公正的结果。这项工作弥合了人工智能伦理学与系统设计之间的差距,并为问责制、透明度和社会责任的多智能体人工智能系统的建立提供了基础。
https://arxiv.org/abs/2502.07254
Transformers have become foundational for visual tasks such as object detection, semantic segmentation, and video understanding, but their quadratic complexity in attention mechanisms presents scalability challenges. To address these limitations, the Mamba architecture utilizes state-space models (SSMs) for linear scalability, efficient processing, and improved contextual awareness. This paper investigates Mamba architecture for visual domain applications and its recent advancements, including Vision Mamba (ViM) and VideoMamba, which introduce bidirectional scanning, selective scanning mechanisms, and spatiotemporal processing to enhance image and video understanding. Architectural innovations like position embeddings, cross-scan modules, and hierarchical designs further optimize the Mamba framework for global and local feature extraction. These advancements position Mamba as a promising architecture in computer vision research and applications.
Transformer模型已成为物体检测、语义分割和视频理解等视觉任务的基础,但它们在注意力机制中的二次复杂性带来了可扩展性的挑战。为了应对这些限制,Mamba架构采用状态空间模型(SSMs)实现了线性可扩展性、高效处理以及上下文感知能力的提升。本文探讨了Mamba架构在视觉领域应用及其近期进展,包括Vision Mamba (ViM) 和VideoMamba,这两种方法引入双向扫描、选择性扫描机制和时空处理技术来增强图像和视频的理解能力。诸如位置嵌入、跨扫模块以及分层设计等架构创新进一步优化了Mamba框架,使其在全局与局部特征提取方面表现得更为出色。这些进步使得Mamba成为计算机视觉研究和应用领域中一个有前景的架构选择。
https://arxiv.org/abs/2502.07161
Large-scale surveys are essential tools for informing social science research and policy, but running surveys is costly and time-intensive. If we could accurately simulate group-level survey results, this would therefore be very valuable to social science research. Prior work has explored the use of large language models (LLMs) for simulating human behaviors, mostly through prompting. In this paper, we are the first to specialize LLMs for the task of simulating survey response distributions. As a testbed, we use country-level results from two global cultural surveys. We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions for a given question. Then, we show that this method substantially outperforms other methods and zero-shot classifiers, even on unseen questions, countries, and a completely unseen survey. While even our best models struggle with the task, especially on unseen questions, our results demonstrate the benefits of specialization for simulation, which may accelerate progress towards sufficiently accurate simulation in the future.
大规模调查是社会科学研究和政策制定的重要工具,但进行这些调查既耗时又昂贵。如果能够准确地模拟群体层面的调查结果,这将对社会科学研究非常有价值。先前的工作已经探索了使用大型语言模型(LLMs)来模拟人类行为的可能性,主要是通过提示的方式来进行。在本文中,我们首次专门针对模拟调查回应分布的任务定制大型语言模型。作为测试平台,我们采用了两个全球文化调查中的国家层面的结果数据。 我们设计了一种基于首令牌概率的微调方法,以最小化给定问题上预测响应分布和实际响应分布之间的差异。然后,我们展示了这种方法在处理未见过的问题、国家以及完全未知的调查时,明显优于其他方法和零样本分类器。 尽管即使是我们最好的模型在这个任务中也面临着挑战,尤其是在处理未见过的问题时,我们的研究结果表明了专门化对于模拟的好处,这可能会加速未来达到足够准确度的仿真工作的进展。
https://arxiv.org/abs/2502.07068
Federated Continual Learning (FCL) has emerged as a robust solution for collaborative model training in dynamic environments, where data samples are continuously generated and distributed across multiple devices. This survey provides a comprehensive review of FCL, focusing on key challenges such as heterogeneity, model stability, communication overhead, and privacy preservation. We explore various forms of heterogeneity and their impact on model performance. Solutions to non-IID data, resource-constrained platforms, and personalized learning are reviewed in an effort to show the complexities of handling heterogeneous data distributions. Next, we review techniques for ensuring model stability and avoiding catastrophic forgetting, which are critical in non-stationary environments. Privacy-preserving techniques are another aspect of FCL that have been reviewed in this work. This survey has integrated insights from federated learning and continual learning to present strategies for improving the efficacy and scalability of FCL systems, making it applicable to a wide range of real-world scenarios.
联邦连续学习(Federated Continual Learning,FCL)已成为动态环境中协作模型训练的稳健解决方案,在这种环境下,数据样本不断生成并分布在多个设备上。本综述全面回顾了FCL的关键挑战,包括异质性、模型稳定性、通信开销和隐私保护等问题。我们探讨了各种形式的异质性和它们对模型性能的影响,并审查了解决非独立同分布(non-IID)数据、资源受限平台和个人化学习问题的方法,以展示处理异构数据分布的复杂性。 接下来,我们回顾确保模型稳定性和避免灾难性遗忘的技术,在非稳态环境中这些技术至关重要。在本工作中还审查了用于保护隐私的技术,这是FCL的另一个方面。本综述结合了联邦学习和连续学习领域的见解,提出了改进FCL系统有效性和可扩展性的策略,使其适用于各种现实场景。 通过整合来自联邦学习和持续学习领域的洞见,本文提出了一系列提高联邦连续学习系统的有效性和可扩展性的策略,使该技术能够应用于广泛的现实世界情景中。
https://arxiv.org/abs/2502.07059
Large Language Models (LLMs) are emerging as transformative tools for software vulnerability detection, addressing critical challenges in the security domain. Traditional methods, such as static and dynamic analysis, often falter due to inefficiencies, high false positive rates, and the growing complexity of modern software systems. By leveraging their ability to analyze code structures, identify patterns, and generate repair sugges- tions, LLMs, exemplified by models like GPT, BERT, and CodeBERT, present a novel and scalable approach to mitigating vulnerabilities. This paper provides a detailed survey of LLMs in vulnerability detection. It examines key aspects, including model architectures, application methods, target languages, fine-tuning strategies, datasets, and evaluation metrics. We also analyze the scope of current research problems, highlighting the strengths and weaknesses of existing approaches. Further, we address challenges such as cross-language vulnerability detection, multimodal data integration, and repository-level analysis. Based on these findings, we propose solutions for issues like dataset scalability, model interpretability, and applications in low-resource scenarios. Our contributions are threefold: (1) a systematic review of how LLMs are applied in vulnerability detection; (2) an analysis of shared patterns and differences across studies, with a unified framework for understanding the field; and (3) a summary of key challenges and future research directions. This work provides valuable insights for advancing LLM-based vulnerability detection. We also maintain and regularly update latest selected paper on this https URL
大型语言模型(LLMs)正逐渐成为软件漏洞检测领域的变革性工具,能够解决安全领域中的关键挑战。传统的静态和动态分析方法往往因为效率低下、误报率高以及现代软件系统的日益复杂而效果不佳。通过利用其分析代码结构、识别模式并生成修复建议的能力,以GPT、BERT和CodeBERT为代表的LLMs提供了一种新颖且可扩展的方式来缓解漏洞问题。 本文对用于漏洞检测的大型语言模型进行了详细的综述。文中探讨了包括模型架构、应用方法、目标编程语言、微调策略、数据集以及评估指标等关键方面,并分析当前研究面临的问题,突出现有方法的优势与不足。此外,还讨论了一些挑战,例如跨语言漏洞检测、多模态数据整合和仓库级分析问题。基于这些发现,我们提出了针对数据集规模扩展、模型解释性和低资源场景应用等问题的解决方案。 我们的贡献有三方面:(1)系统性地回顾了LLMs在漏洞检测中的应用场景;(2)通过统一框架对不同研究中共享的模式与差异进行了分析;(3)总结了关键挑战及未来的研究方向。这项工作为推进基于大型语言模型的漏洞检测提供了有价值的见解。 请注意,文中提到的网址链接可能需要您直接访问以获取最新精选论文列表和进一步信息。
https://arxiv.org/abs/2502.07049
The integration of preference alignment with diffusion models (DMs) has emerged as a transformative approach to enhance image generation and editing capabilities. Although integrating diffusion models with preference alignment strategies poses significant challenges for novices at this intersection, comprehensive and systematic reviews of this subject are still notably lacking. To bridge this gap, this paper extensively surveys preference alignment with diffusion models in image generation and editing. First, we systematically review cutting-edge optimization techniques such as reinforcement learning with human feedback (RLHF), direct preference optimization (DPO), and others, highlighting their pivotal role in aligning preferences with DMs. Then, we thoroughly explore the applications of aligning preferences with DMs in autonomous driving, medical imaging, robotics, and more. Finally, we comprehensively discuss the challenges of preference alignment with DMs. To our knowledge, this is the first survey centered on preference alignment with DMs, providing insights to drive future innovation in this dynamic area.
偏好对齐与扩散模型(DM)的结合已经作为一种变革性的方法,用于提升图像生成和编辑能力。尽管将扩散模型与偏好对齐策略进行集成对于这个交叉领域的初学者来说存在重大挑战,但关于这一主题的全面系统性回顾仍明显不足。为了弥补这一缺口,本文广泛调查了在图像生成和编辑中利用偏好对齐与扩散模型的情况。首先,我们系统地回顾了前沿优化技术,如带有人类反馈的强化学习(RLHF)、直接偏好优化(DPO)等,并强调这些技术在将偏好与DM对齐中的关键作用。然后,我们深入探讨了在自动驾驶、医学成像、机器人技术和更多领域中使用偏好对齐与扩散模型的应用情况。最后,我们全面讨论了偏好对齐与扩散模型面临的挑战。据我们所知,这是首次专注于偏好对齐与扩散模型的综述性研究,为未来这一动态领域的创新提供了见解。
https://arxiv.org/abs/2502.07829
Recent advancements in AI-generated content have significantly improved the realism of 3D and 4D generation. However, most existing methods prioritize appearance consistency while neglecting underlying physical principles, leading to artifacts such as unrealistic deformations, unstable dynamics, and implausible objects interactions. Incorporating physics priors into generative models has become a crucial research direction to enhance structural integrity and motion realism. This survey provides a review of physics-aware generative methods, systematically analyzing how physical constraints are integrated into 3D and 4D generation. First, we examine recent works in incorporating physical priors into static and dynamic 3D generation, categorizing methods based on representation types, including vision-based, NeRF-based, and Gaussian Splatting-based approaches. Second, we explore emerging techniques in 4D generation, focusing on methods that model temporal dynamics with physical simulations. Finally, we conduct a comparative analysis of major methods, highlighting their strengths, limitations, and suitability for different materials and motion dynamics. By presenting an in-depth analysis of physics-grounded AIGC, this survey aims to bridge the gap between generative models and physical realism, providing insights that inspire future research in physically consistent content generation.
最近在AI生成内容方面取得的进步显著提高了3D和4D生成的逼真度。然而,大多数现有的方法侧重于外观一致性而忽视了底层物理原理,导致如不真实的变形、不稳定动态以及不合理物体互动等缺陷。将物理学先验知识融入生成模型已经成为提高结构完整性和运动真实感的重要研究方向。本综述文章回顾了物理感知生成方法,并系统地分析如何在3D和4D生成中整合物理约束。首先,我们审查了最近关于在静态和动态3D生成中加入物理先验的成果,根据视觉、NeRF(神经辐射场)和高斯点阵等表示类型对这些方法进行分类。其次,探讨新兴的4D生成技术,重点关注通过物理模拟建模时间动态的方法。最后,我们进行了主要方法的比较分析,强调它们的优点、局限性以及在不同材料及运动动力学中的适用性。通过深入分析基于物理学原理的AI内容生成,本综述旨在弥合生成模型与物理真实感之间的差距,并为未来研究提供灵感,促进符合物理一致性的内容生成的发展。
https://arxiv.org/abs/2502.07007
The increasing demand for Intelligent Transportation Systems (ITS) has introduced significant challenges in managing the complex, computation-intensive tasks generated by modern vehicles while offloading tasks to external computing infrastructures such as edge computing (EC), nearby vehicular , and UAVs has become influential solution to these challenges. However, traditional computational offloading strategies often struggle to adapt to the dynamic and heterogeneous nature of vehicular environments. In this study, we explored the potential of Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) frameworks to optimize computational offloading through adaptive, real-time decision-making, and we have thoroughly investigated the Markov Decision Process (MDP) approaches on the existing literature. The paper focuses on key aspects such as standardized learning models, optimized reward structures, and collaborative multi-agent systems, aiming to advance the understanding and application of DRL in vehicular networks. Our findings offer insights into enhancing the efficiency, scalability, and robustness of ITS, setting the stage for future innovations in this rapidly evolving field.
对智能交通系统(ITS)的需求不断增加,引发了管理由现代车辆生成的复杂计算密集型任务的重大挑战。将这些任务卸载到外部计算基础设施(如边缘计算(EC)、邻近车联网络和无人机(UAVs))已成为解决此类挑战的有效方法。然而,传统的计算卸载策略往往难以适应车辆环境中的动态性和异构性。在本研究中,我们探索了强化学习(RL)和深度强化学习(DRL)框架的潜力,以通过自适应、实时决策优化计算卸载,并且已经对现有文献中的马尔可夫决策过程(MDP)方法进行了深入调查。本文重点关注标准化的学习模型、优化的奖励结构以及协作多智能体系统等关键方面,旨在推进DRL在车联网络中的理解和应用。我们的研究结果提供了关于如何提高ITS效率、可扩展性和鲁棒性的见解,为这一迅速发展的领域未来创新奠定了基础。
https://arxiv.org/abs/2502.06963
The explosive growth of video data has driven the development of distributed video analytics in cloud-edge-terminal collaborative (CETC) systems, enabling efficient video processing, real-time inference, and privacy-preserving analysis. Among multiple advantages, CETC systems can distribute video processing tasks and enable adaptive analytics across cloud, edge, and terminal devices, leading to breakthroughs in video surveillance, autonomous driving, and smart cities. In this survey, we first analyze fundamental architectural components, including hierarchical, distributed, and hybrid frameworks, alongside edge computing platforms and resource management mechanisms. Building upon these foundations, edge-centric approaches emphasize on-device processing, edge-assisted offloading, and edge intelligence, while cloud-centric methods leverage powerful computational capabilities for complex video understanding and model training. Our investigation also covers hybrid video analytics incorporating adaptive task offloading and resource-aware scheduling techniques that optimize performance across the entire system. Beyond conventional approaches, recent advances in large language models and multimodal integration reveal both opportunities and challenges in platform scalability, data protection, and system reliability. Future directions also encompass explainable systems, efficient processing mechanisms, and advanced video analytics, offering valuable insights for researchers and practitioners in this dynamic field.
视频数据的爆炸性增长推动了云端边端协同(CETC)系统中分布式视频分析的发展,实现了高效的视频处理、实时推理和隐私保护分析。在诸多优势中,CETC系统可以将视频处理任务分配到云、边缘和终端设备上,并实现适应性的数据分析,在视频监控、自动驾驶和智慧城市等领域取得了突破性进展。在这篇综述文章中,我们首先分析了基本的架构组件,包括层次化、分布式以及混合框架,并讨论了边缘计算平台与资源管理机制。 在此基础上,以边缘为中心的方法强调本地处理、边缘辅助卸载及边缘智能,而云中心方法则利用强大的计算能力来进行复杂的视频理解和模型训练。我们的研究还涵盖了结合自适应任务卸载和资源感知调度技术的混合视频分析,这些技术优化了整个系统的性能表现。 除了传统的做法之外,最近在大型语言模型与跨模态融合方面的进展也揭示了平台可扩展性、数据保护以及系统可靠性等方面的机遇和挑战。未来的研究方向包括可解释性系统、高效处理机制及高级视频分析,为该领域的研究者和实践者提供了宝贵的见解。
https://arxiv.org/abs/2502.06581
Computational neuroimaging involves analyzing brain images or signals to provide mechanistic insights and predictive tools for human cognition and behavior. While diffusion models have shown stability and high-quality generation in natural images, there is increasing interest in adapting them to analyze brain data for various neurological tasks such as data enhancement, disease diagnosis and brain decoding. This survey provides an overview of recent efforts to integrate diffusion models into computational neuroimaging. We begin by introducing the common neuroimaging data modalities, follow with the diffusion formulations and conditioning mechanisms. Then we discuss how the variations of the denoising starting point, condition input and generation target of diffusion models are developed and enhance specific neuroimaging tasks. For a comprehensive overview of the ongoing research, we provide a publicly available repository at this https URL.
计算神经成像涉及分析大脑图像或信号,以提供人类认知和行为的机制性见解及预测工具。虽然扩散模型在自然图像中表现出稳定性和高质量生成能力,但人们越来越有兴趣将它们应用于脑数据的分析,用于各种神经学任务,如数据增强、疾病诊断和大脑解码等。本文综述了最近将扩散模型整合到计算神经成像中的努力。首先介绍了常见的神经影像数据模态,接着讨论了扩散形式化及其条件机制。然后,我们探讨了通过调整扩散模型的去噪起点、条件输入及生成目标的变化如何发展并提升特定的神经影像任务。为了全面了解正在进行的研究,我们在以下网址提供了一个公开可用的资源库:[此链接](https://this-url.com)。 (注释中的URL应替换为实际提供的有效链接)
https://arxiv.org/abs/2502.06552