Large Language Models (LLMs) have achieved impressive performance on complex reasoning tasks with Chain-of-Thought (CoT) prompting. However, conventional CoT relies on reasoning steps explicitly verbalized in natural language, introducing inefficiencies and limiting its applicability to abstract reasoning. To address this, there has been growing research interest in latent CoT reasoning, where inference occurs within latent spaces. By decoupling reasoning from language, latent reasoning promises richer cognitive representations and more flexible, faster inference. Researchers have explored various directions in this promising field, including training methodologies, structural innovations, and internal reasoning mechanisms. This paper presents a comprehensive overview and analysis of this reasoning paradigm. We begin by proposing a unified taxonomy from four perspectives: token-wise strategies, internal mechanisms, analysis, and applications. We then provide in-depth discussions and comparative analyses of representative methods, highlighting their design patterns, strengths, and open challenges. We aim to provide a structured foundation for advancing this emerging direction in LLM reasoning. The relevant papers will be regularly updated at this https URL.
大型语言模型(LLMs)在使用链式思维(CoT)提示的复杂推理任务中取得了令人印象深刻的性能。然而,传统的CoT依赖于通过自然语言明确表述的推理步骤,这引入了效率低下并限制了其应用于抽象推理的能力。为了解决这一问题,人们对潜在的CoT推理产生了越来越多的研究兴趣,在这种推理方式下,推断发生在潜空间内。通过将推理与语言解耦,潜在的推理承诺提供更丰富的认知表示和更加灵活、快速的推断能力。研究人员在这一前景广阔的领域探索了各种方向,包括训练方法、结构创新以及内部推理机制。本文对这一推理范式进行了全面概述和分析。我们首先从四个视角:词元策略、内部机制、分析和应用的角度提出一个统一的分类法。然后,我们深入讨论并对比分析代表性方法,突出它们的设计模式、优势及开放挑战。我们的目标是为推进LLM推理中的这一新兴方向提供结构化的基础。相关论文将在该网址定期更新:[https URL](请将URL替换为您提供的正确链接)。
https://arxiv.org/abs/2505.16782
Jailbreak attacks pose a serious threat to large language models (LLMs) by bypassing built-in safety mechanisms and leading to harmful outputs. Studying these attacks is crucial for identifying vulnerabilities and improving model security. This paper presents a systematic survey of jailbreak methods from the novel perspective of stealth. We find that existing attacks struggle to simultaneously achieve toxic stealth (concealing toxic content) and linguistic stealth (maintaining linguistic naturalness). Motivated by this, we propose StegoAttack, a fully stealthy jailbreak attack that uses steganography to hide the harmful query within benign, semantically coherent text. The attack then prompts the LLM to extract the hidden query and respond in an encrypted manner. This approach effectively hides malicious intent while preserving naturalness, allowing it to evade both built-in and external safety mechanisms. We evaluate StegoAttack on four safety-aligned LLMs from major providers, benchmarking against eight state-of-the-art methods. StegoAttack achieves an average attack success rate (ASR) of 92.00%, outperforming the strongest baseline by 11.0%. Its ASR drops by less than 1% even under external detection (e.g., Llama Guard). Moreover, it attains the optimal comprehensive scores on stealth detection metrics, demonstrating both high efficacy and exceptional stealth capabilities. The code is available at this https URL
监狱突破攻击(jailbreak attacks)对大型语言模型(LLMs)构成严重威胁,因为它们能够绕过内置的安全机制,并导致有害输出。研究这些攻击对于识别漏洞和提高模型安全性至关重要。本文从隐秘性这一新颖视角系统地调查了监狱突破方法。我们发现,现有攻击在同时实现有毒内容隐蔽性和语言自然性的隐形性方面存在困难。为此,我们提出了StegoAttack,这是一种完全隐形的监狱突破攻击方法,它使用信息隐藏技术(即,数字水印和信息隐写术)将有害查询隐藏在良性、语义连贯的文本中。然后通过提示LLM提取被隐藏的问题并以加密的方式进行响应来执行该攻击。这种方法有效地隐藏了恶意意图,同时保持自然性,使其能够避开内置的安全机制以及外部检测系统。 我们在四大主要提供商提供的四种安全对齐的大型语言模型上评估了StegoAttack,并与八种最先进的方法进行了基准测试。StegoAttack达到了92.00%的平均攻击成功率(ASR),比最强基线高出11.0%。即使在外部检测条件下,其攻击成功率下降幅度也不超过1%(例如,Llama Guard)。此外,在隐蔽性检测度量指标上,它获得了最佳的整体分数,展示了高度的有效性和出色的隐秘能力。 代码可在以下链接获取:[提供具体网址]
https://arxiv.org/abs/2505.16765
Adapting cultural values in Large Language Models (LLMs) presents significant challenges, particularly due to biases and limited training data. Prior work primarily aligns LLMs with different cultural values using World Values Survey (WVS) data. However, it remains unclear whether this approach effectively captures cultural nuances or produces distinct cultural representations for various downstream tasks. In this paper, we systematically investigate WVS-based training for cultural value adaptation and find that relying solely on survey data can homogenize cultural norms and interfere with factual knowledge. To investigate these issues, we augment WVS with encyclopedic and scenario-based cultural narratives from Wikipedia and NormAd. While these narratives may have variable effects on downstream tasks, they consistently improve cultural distinctiveness than survey data alone. Our work highlights the inherent complexity of aligning cultural values with the goal of guiding task-specific behavior.
在大型语言模型(LLM)中适应文化价值观面临重大挑战,尤其是由于偏见和有限的训练数据。先前的工作主要通过使用世界价值观调查(WVS)数据来使LLM与不同的文化价值对齐。然而,尚不清楚这种方法是否能够有效地捕捉文化的细微差别或为各种下游任务生成独特文化表示。在本文中,我们系统地研究了基于WVS的训练的文化价值观适应,并发现仅依赖于调查数据可能会同质化文化规范并干扰事实性知识。为了探究这些问题,我们将WVS与来自维基百科和NormAd的知识性和场景化的文化叙述相结合。虽然这些叙述可能对下游任务的影响不一,但它们比单独使用调查数据更能提高文化的独特性。我们的工作强调了将文化价值观与其特定于任务的行为目标相协调的内在复杂性。
https://arxiv.org/abs/2505.16408
Materials are the foundation of modern society, underpinning advancements in energy, electronics, healthcare, transportation, and infrastructure. The ability to discover and design new materials with tailored properties is critical to solving some of the most pressing global challenges. In recent years, the growing availability of high-quality materials data combined with rapid advances in Artificial Intelligence (AI) has opened new opportunities for accelerating materials discovery. Data-driven generative models provide a powerful tool for materials design by directly create novel materials that satisfy predefined property requirements. Despite the proliferation of related work, there remains a notable lack of up-to-date and systematic surveys in this area. To fill this gap, this paper provides a comprehensive overview of recent progress in AI-driven materials generation. We first organize various types of materials and illustrate multiple representations of crystalline materials. We then provide a detailed summary and taxonomy of current AI-driven materials generation approaches. Furthermore, we discuss the common evaluation metrics and summarize open-source codes and benchmark datasets. Finally, we conclude with potential future directions and challenges in this fast-growing field. The related sources can be found at this https URL.
材料是现代社会的基础,支撑着能源、电子、医疗保健、交通和基础设施等领域的发展。发现并设计具有特定属性的新材料对于解决一些最紧迫的全球挑战至关重要。近年来,高质量材料数据的日益增多以及人工智能(AI)领域的快速进步为加速新材料的发现开辟了新的机遇。基于数据驱动的生成模型通过直接创造满足预定义属性要求的新材料,成为材料设计的强大工具。尽管相关工作不断涌现,但在这一领域中仍然缺乏更新且系统的综述文献。为了填补这一空白,本文提供了一份关于人工智能驱动的材料生成领域的最新进展的全面概述。 首先,我们组织了各种类型的材料,并展示了晶体材料的多种表示方法。其次,我们提供了当前基于AI的新材料生成方法的详细总结和分类体系。此外,我们还讨论了常见的评估指标,并汇总了开源代码和基准数据集。最后,本文提出了这一快速发展的领域内未来潜在的方向与挑战。 相关文献资料可在此链接查阅:[此处应为实际URL]
https://arxiv.org/abs/2505.16379
The exponential growth of scientific publications has made it increasingly difficult for researchers to stay updated and synthesize knowledge effectively. This paper presents XSum, a modular pipeline for multi-document summarization (MDS) in the scientific domain using Retrieval-Augmented Generation (RAG). The pipeline includes two core components: a question-generation module and an editor module. The question-generation module dynamically generates questions adapted to the input papers, ensuring the retrieval of relevant and accurate information. The editor module synthesizes the retrieved content into coherent and well-structured summaries that adhere to academic standards for proper citation. Evaluated on the SurveySum dataset, XSum demonstrates strong performance, achieving considerable improvements in metrics such as CheckEval, G-Eval and Ref-F1 compared to existing approaches. This work provides a transparent, adaptable framework for scientific summarization with potential applications in a wide range of domains. Code available at this https URL
科学出版物的指数增长使得研究人员难以跟上最新进展并有效地综合知识。本文介绍了XSum,这是一个用于科学领域多文档摘要(MDS)的模块化流水线,采用了检索增强生成(RAG)。该流水线包括两个核心组件:一个问题是生成模块和一个编辑模块。问题生成模块能够根据输入论文动态生成相关的问题,确保检索到准确的信息。编辑模块则将检索到的内容合成成连贯且结构良好的摘要,并遵循学术规范进行恰当引用。 在SurveySum数据集上的评估中,XSum展示了强大的性能,在CheckEval、G-Eval和Ref-F1等指标上相比现有方法实现了显著的改进。这项工作提供了一个透明且灵活的框架,用于科学领域的总结,并具有广泛的应用潜力。代码可在该链接获取:[此URL]
https://arxiv.org/abs/2505.16349
Ultra-high-definition (UHD) image restoration aims to specifically solve the problem of quality degradation in ultra-high-resolution images. Recent advancements in this field are predominantly driven by deep learning-based innovations, including enhancements in dataset construction, network architecture, sampling strategies, prior knowledge integration, and loss functions. In this paper, we systematically review recent progress in UHD image restoration, covering various aspects ranging from dataset construction to algorithm design. This serves as a valuable resource for understanding state-of-the-art developments in the field. We begin by summarizing degradation models for various image restoration subproblems, such as super-resolution, low-light enhancement, deblurring, dehazing, deraining, and desnowing, and emphasizing the unique challenges of their application to UHD image restoration. We then highlight existing UHD benchmark datasets and organize the literature according to degradation types and dataset construction methods. Following this, we showcase major milestones in deep learning-driven UHD image restoration, reviewing the progression of restoration tasks, technological developments, and evaluations of existing methods. We further propose a classification framework based on network architectures and sampling strategies, helping to clearly organize existing methods. Finally, we share insights into the current research landscape and propose directions for further advancements. A related repository is available at this https URL.
超高清(UHD)图像恢复旨在解决超高分辨率图像质量下降的问题。近年来,该领域的主要进展主要来自于基于深度学习的创新,包括数据集构建、网络架构、采样策略、先验知识整合和损失函数等方面的改进。本文系统地回顾了近期在UHD图像恢复领域的进步,涵盖了从数据集构建到算法设计等各个方面,为理解这一领域的前沿发展提供了宝贵的资源。 我们首先总结了几种不同图像恢复子问题的退化模型,例如超分辨率(Super-resolution)、低光增强(Low-light enhancement)、去模糊(Deblurring)、去雾(Dehazing)、除雨(Deraining)和除雪(Desnowing),并强调了它们在UHD图像恢复中应用的独特挑战。然后,我们展示了现有的UHD基准数据集,并根据退化类型和数据集构建方法对文献进行了分类整理。 接下来,我们将展示深度学习驱动的UHD图像恢复的主要里程碑,回顾恢复任务、技术发展以及现有方法的评估情况。此外,我们提出了一种基于网络架构和采样策略的分类框架,有助于清晰地组织现有的方法。最后,我们分享了当前研究领域的见解,并提出了进一步发展的方向。 有关本文的相关资源库可以在此网址访问:[相关链接](https://this-url.com/)(请将“this-url”替换为实际提供的URL)。
https://arxiv.org/abs/2505.16161
Social media's rise establishes user-generated content (UGC) as pivotal for travel decisions, yet analytical methods lack scalability. This study introduces a dual-method LLM framework: unsupervised expectation extraction from UGC paired with survey-informed supervised fine-tuning. Findings reveal leisure/social expectations drive engagement more than foundational natural/emotional factors. By establishing LLMs as precision tools for expectation quantification, we advance tourism analytics methodology and propose targeted strategies for experience personalization and social travel promotion. The framework's adaptability extends to consumer behavior research, demonstrating computational social science's transformative potential in marketing optimization.
社交媒体的兴起使用户生成内容(UGC)成为旅行决策中的关键因素,然而分析方法却缺乏可扩展性。本研究引入了一种双管齐下的大语言模型(LLM)框架:即从UGC中进行无监督期望提取,并结合调查信息进行有监督微调。研究结果表明,休闲和社会方面的期望比自然和情感等基础因素更能驱动用户参与度。通过将大语言模型确立为精确量化期望的工具,我们推动了旅游分析方法的发展,并提出了针对体验个性化及社交旅行推广的目标策略。该框架的适应性还扩展到了消费者行为研究领域,展示了计算社会科学在市场营销优化中具有变革性的潜力。
https://arxiv.org/abs/2505.16118
Large language models (LLMs) are introducing a paradigm shift in molecular discovery by enabling text-guided interaction with chemical spaces through natural language, symbolic notations, with emerging extensions to incorporate multi-modal inputs. To advance the new field of LLM for molecular discovery, this survey provides an up-to-date and forward-looking review of the emerging use of LLMs for two central tasks: molecule generation and molecule optimization. Based on our proposed taxonomy for both problems, we analyze representative techniques in each category, highlighting how LLM capabilities are leveraged across different learning settings. In addition, we include the commonly used datasets and evaluation protocols. We conclude by discussing key challenges and future directions, positioning this survey as a resource for researchers working at the intersection of LLMs and molecular science. A continuously updated reading list is available at this https URL.
大型语言模型(LLMs)通过自然语言和符号表示,开启了分子发现领域的范式转变,并且正在扩展以纳入多模态输入。为了推进用于分子发现的LLM这一新兴领域的发展,本综述提供了一个针对两个核心任务——分子生成和分子优化——中新兴的LLM应用的最新且具有前瞻性的回顾。根据我们为这两个问题提出的分类法,我们在每类代表性技术中进行了分析,并强调了在不同学习设置下如何利用LLM的能力。此外,我们还包含了常用的数据库和评估协议。最后,我们讨论了关键挑战和未来的研究方向,将本综述定位为从事LLMs与分子科学研究交叉领域研究者的资源。持续更新的阅读清单可在此网址获取:[此处应填入具体的URL链接]。
https://arxiv.org/abs/2505.16094
With advancements in large audio-language models (LALMs), which enhance large language models (LLMs) with auditory capabilities, these models are expected to demonstrate universal proficiency across various auditory tasks. While numerous benchmarks have emerged to assess LALMs' performance, they remain fragmented and lack a structured taxonomy. To bridge this gap, we conduct a comprehensive survey and propose a systematic taxonomy for LALM evaluations, categorizing them into four dimensions based on their objectives: (1) General Auditory Awareness and Processing, (2) Knowledge and Reasoning, (3) Dialogue-oriented Ability, and (4) Fairness, Safety, and Trustworthiness. We provide detailed overviews within each category and highlight challenges in this field, offering insights into promising future directions. To the best of our knowledge, this is the first survey specifically focused on the evaluations of LALMs, providing clear guidelines for the community. We will release the collection of the surveyed papers and actively maintain it to support ongoing advancements in the field.
随着大型音频-语言模型(LALM)的发展,这些模型通过增强大型语言模型(LLM)的听觉能力,在各种听觉任务中展现出广泛的适应性。尽管已经出现了许多评估LALM性能的基准测试,但它们仍然是分散且缺乏结构化的分类系统。为了弥补这一差距,我们进行了全面调查,并提出了一套系统的评价体系,将LALMs的评测根据其目标分为四个维度:(1)通用听觉意识与处理能力;(2)知识和推理能力;(3)对话导向的能力;以及(4)公平性、安全性和可靠性。在每个分类中我们提供了详细的概述,并强调了该领域的挑战,同时指出了未来有前景的发展方向。 据我们所知,这是第一份专门针对LALM评估的调查报告,为社区提供明确指南。我们将发布所有被调研论文的集合,并积极维护更新以支持这一领域不断发展的需求。
https://arxiv.org/abs/2505.15957
Integrating Large Language Models (LLMs) and Evolutionary Computation (EC) represents a promising avenue for advancing artificial intelligence by combining powerful natural language understanding with optimization and search capabilities. This manuscript explores the synergistic potential of LLMs and EC, reviewing their intersections, complementary strengths, and emerging applications. We identify key opportunities where EC can enhance LLM training, fine-tuning, prompt engineering, and architecture search, while LLMs can, in turn, aid in automating the design, analysis, and interpretation of ECs. The manuscript explores the synergistic integration of EC and LLMs, highlighting their bidirectional contributions to advancing artificial intelligence. It first examines how EC techniques enhance LLMs by optimizing key components such as prompt engineering, hyperparameter tuning, and architecture search, demonstrating how evolutionary methods automate and refine these processes. Secondly, the survey investigates how LLMs improve EC by automating metaheuristic design, tuning evolutionary algorithms, and generating adaptive heuristics, thereby increasing efficiency and scalability. Emerging co-evolutionary frameworks are discussed, showcasing applications across diverse fields while acknowledging challenges like computational costs, interpretability, and algorithmic convergence. The survey concludes by identifying open research questions and advocating for hybrid approaches that combine the strengths of EC and LLMs.
将大型语言模型(LLM)和进化计算(EC)结合在一起代表了一种有前景的方法,可以通过将强大的自然语言理解与优化和搜索能力相结合来推进人工智能的发展。本文探讨了LLM与EC之间的协同潜力,回顾了它们的交集、互补优势以及新兴应用。我们确定了一些关键机遇,其中EC可以增强LLM的训练、微调、提示工程及架构搜索,而LLM则可以在设计自动化、分析和解释EC方面提供帮助。论文探索了EC和LLM的相互集成方式,强调了它们在推进人工智能方面的双向贡献。首先,本文考察了EC技术如何通过优化关键组件(如提示工程、超参数调优和架构搜索)来增强LLM,展示了进化方法是如何自动化并改进这些过程的。其次,文献调查了LLM如何通过自动设计元启发式算法、调整进化算法以及生成自适应启发法来提高EC的效率和可扩展性。本文还讨论了一些新兴的共生框架,展示它们在各个领域的应用,并且注意到了诸如计算成本、解释性和算法收敛等挑战。最终,文献确定了开放性的研究问题,并倡导采用结合了EC和LLM优势的混合方法。
https://arxiv.org/abs/2505.15741
The increasing prevalence of mental health disorders globally highlights the urgent need for effective digital screening methods that can be used in multilingual contexts. Most existing studies, however, focus on English data, overlooking critical mental health signals that may be present in non-English texts. To address this important gap, we present the first survey on the detection of mental health disorders using multilingual social media data. We investigate the cultural nuances that influence online language patterns and self-disclosure behaviors, and how these factors can impact the performance of NLP tools. Additionally, we provide a comprehensive list of multilingual data collections that can be used for developing NLP models for mental health screening. Our findings can inform the design of effective multilingual mental health screening tools that can meet the needs of diverse populations, ultimately improving mental health outcomes on a global scale.
全球心理健康障碍的日益普遍凸显了在多语言环境中使用有效数字筛查方法的紧迫需求。然而,现有的大多数研究主要集中在英语数据上,忽视了非英语文本中可能存在的重要心理健康信号。为了填补这一重要空白,我们提出了第一份关于利用多语言社交媒体数据检测心理健康障碍的调查报告。我们探讨了文化细微差别如何影响在线语言模式和自我披露行为,并分析这些因素如何影响自然语言处理工具的表现。此外,我们还提供了一份全面的多语言数据集列表,可用于开发用于心理健康筛查的NLP模型。我们的研究发现可以为设计有效的多语言心理健康筛查工具提供建议,以满足不同人群的需求,最终在全球范围内改善心理健康结果。
https://arxiv.org/abs/2505.15556
Commuting Origin-destination~(OD) flows, capturing daily population mobility of citizens, are vital for sustainable development across cities around the world. However, it is challenging to obtain the data due to the high cost of travel surveys and privacy concerns. Surprisingly, we find that satellite imagery, publicly available across the globe, contains rich urban semantic signals to support high-quality OD flow generation, with over 98\% expressiveness of traditional multisource hard-to-collect urban sociodemographic, economics, land use, and point of interest data. This inspires us to design a novel data generator, GlODGen, which can generate OD flow data for any cities of interest around the world. Specifically, GlODGen first leverages Vision-Language Geo-Foundation Models to extract urban semantic signals related to human mobility from satellite imagery. These features are then combined with population data to form region-level representations, which are used to generate OD flows via graph diffusion models. Extensive experiments on 4 continents and 6 representative cities show that GlODGen has great generalizability across diverse urban environments on different continents and can generate OD flow data for global cities highly consistent with real-world mobility data. We implement GlODGen as an automated tool, seamlessly integrating data acquisition and curation, urban semantic feature extraction, and OD flow generation together. It has been released at this https URL.
通勤起讫点(OD)流量数据,记录了市民的日常人口流动情况,在世界各地的城市可持续发展中至关重要。然而,由于出行调查成本高昂及隐私顾虑等原因,获取这些数据极具挑战性。令人惊讶的是,我们发现全球范围内公开可用的卫星影像中蕴含了大量的城市语义信号,可以支持高质量的OD流生成,并且其对传统多源难以收集的人口社会经济、土地使用和兴趣点等信息的表达度超过了98%。这激发了我们设计了一种新颖的数据生成器GlODGen,它可以为全球任何感兴趣的城市生成OD流量数据。 具体来说,GlODGen首先利用视觉-语言地理基础模型从卫星影像中提取与人类活动相关的城市语义信号。这些特征随后结合人口数据形成区域级表示,并通过图扩散模型来生成OD流。在四大洲和六个代表性城市的广泛实验表明,GlODGen具有很强的跨大陆多样化城市环境中的泛化能力,能够为全球的城市产生高度一致于现实世界流动性的OD流数据。 我们已经将GlODGen实现为一种自动化工具,无缝集成数据获取与整理、城市语义特征提取以及OD流生成过程。该工具已发布在以下网址:[此链接](请根据实际情况替换链接)。
https://arxiv.org/abs/2505.15870
Graph-structured data pervades domains such as social networks, biological systems, knowledge graphs, and recommender systems. While foundation models have transformed natural language processing, vision, and multimodal learning through large-scale pretraining and generalization, extending these capabilities to graphs -- characterized by non-Euclidean structures and complex relational semantics -- poses unique challenges and opens new opportunities. To this end, Graph Foundation Models (GFMs) aim to bring scalable, general-purpose intelligence to structured data, enabling broad transfer across graph-centric tasks and domains. This survey provides a comprehensive overview of GFMs, unifying diverse efforts under a modular framework comprising three key components: backbone architectures, pretraining strategies, and adaptation mechanisms. We categorize GFMs by their generalization scope -- universal, task-specific, and domain-specific -- and review representative methods, key innovations, and theoretical insights within each category. Beyond methodology, we examine theoretical foundations including transferability and emergent capabilities, and highlight key challenges such as structural alignment, heterogeneity, scalability, and evaluation. Positioned at the intersection of graph learning and general-purpose AI, GFMs are poised to become foundational infrastructure for open-ended reasoning over structured data. This survey consolidates current progress and outlines future directions to guide research in this rapidly evolving field. Resources are available at this https URL.
图结构数据在社交网络、生物系统、知识图谱和推荐系统等领域普遍存在。虽然基础模型通过大规模预训练和泛化能力革新了自然语言处理、视觉以及多模态学习,但将其扩展到具有非欧几里得结构和复杂关系语义的图中,则带来了独特的挑战,并开启了新的机遇。为此,图基础模型(GFMs)旨在为结构化数据带来可扩展性和通用性的智能,从而在以图为中心的任务和领域内实现广泛的迁移。本综述对GFMs提供了全面概述,在一个模块化的框架下统一了各种努力,该框架包括三个关键组成部分:骨干架构、预训练策略以及适应机制。我们按其泛化范围——普遍的、任务特定的及领域特定的来分类GFMs,并在每一类中回顾代表性方法、关键技术创新和理论见解。 除了方法论之外,本综述还考察了迁移能力与新涌现的能力等理论基础,同时强调包括结构对齐、异质性、可扩展性和评估在内的关键挑战。位于图学习与通用人工智能交叉点的GFMs正准备成为处理开放式推理问题中的结构性数据的基础架构。本综述总结了当前的研究进展,并为这一快速发展领域提出了未来方向以指导研究。 有关资源请访问[此处](https://this-url.com/)(原文中的URL链接)。
https://arxiv.org/abs/2505.15116
The exponential growth of data-driven systems and AI technologies has intensified the demand for high-quality web-sourced datasets. While existing datasets have proven valuable, conventional web data collection approaches face significant limitations in terms of human effort and scalability. Current data-collecting solutions fall into two categories: wrapper-based methods that struggle with adaptability and reproducibility, and large language model (LLM)-based approaches that incur substantial computational and financial costs. To address these challenges, we propose AutoData, a novel multi-agent system for Automated web Data collection, that requires minimal human intervention, i.e., only necessitating a natural language instruction specifying the desired dataset. In addition, AutoData is designed with a robust multi-agent architecture, featuring a novel oriented message hypergraph coordinated by a central task manager, to efficiently organize agents across research and development squads. Besides, we introduce a novel hypergraph cache system to advance the multi-agent collaboration process that enables efficient automated data collection and mitigates the token cost issues prevalent in existing LLM-based systems. Moreover, we introduce Instruct2DS, a new benchmark dataset supporting live data collection from web sources across three domains: academic, finance, and sports. Comprehensive evaluations over Instruct2DS and three existing benchmark datasets demonstrate AutoData's superior performance compared to baseline methods. Case studies on challenging tasks such as picture book collection and paper extraction from surveys further validate its applicability. Our source code and dataset are available at this https URL.
数据驱动系统和人工智能技术的指数增长加剧了对高质量网络来源数据集的需求。尽管现有的数据集已经证明其价值,但传统的网页数据收集方法在人力投入和可扩展性方面面临重大限制。当前的数据采集解决方案可以分为两类:基于封装的方法难以适应和重复使用,而基于大型语言模型(LLM)的方法则会产生高昂的计算和财务成本。为了应对这些挑战,我们提出了AutoData,这是一种新的多代理自动化网络数据收集系统,只需要极少的人类干预,即仅需提供一个自然语言指令来说明所需的数据库即可。此外,AutoData设计有一个稳健的多代理架构,通过中央任务管理器协调一种新颖的方向性消息超图,以高效地组织分布在研究和开发团队中的各个代理。除此之外,我们还引入了一种新的超图缓存系统,推进了多代理协作过程,使自动化的数据收集更加有效,并缓解了现有基于LLM系统的令牌成本问题。此外,我们推出了Instruct2DS,这是一个支持跨学术、金融和体育三大领域从网络资源实时采集数据的新基准数据集。在Instruct2DS以及三个现有的基准数据集上进行的全面评估表明,AutoData相比基线方法表现更佳。关于图片书收集和调查论文提取等难题的实际案例研究进一步证实了其适用性。我们的源代码和数据集可在以下网址获取:[此https URL]。
https://arxiv.org/abs/2505.15859
While Knowledge Editing has been extensively studied in monolingual settings, it remains underexplored in multilingual contexts. This survey systematizes recent research on Multilingual Knowledge Editing (MKE), a growing subdomain of model editing focused on ensuring factual edits generalize reliably across languages. We present a comprehensive taxonomy of MKE methods, covering parameter-based, memory-based, fine-tuning, and hypernetwork approaches. We survey available benchmarks,summarize key findings on method effectiveness and transfer patterns, identify challenges in cross-lingual propagation, and highlight open problems related to language anisotropy, evaluation coverage, and edit scalability. Our analysis consolidates a rapidly evolving area and lays the groundwork for future progress in editable language-aware LLMs.
尽管知识编辑在单语环境下已经得到了广泛研究,但在多语言环境中却仍处于探索不足的状态。本文综述了近期关于跨语言知识编辑(MKE)的研究成果,这是一种新兴的模型编辑子领域,专注于确保事实性修改能在不同语言中可靠地推广。我们提出了一个涵盖参数化、基于内存、微调和超网络方法的多语言知识编辑方法分类体系。本文还调查了可用基准测试,总结了关于方法有效性及转移模式的关键发现,识别出跨语言传播中的挑战,并强调了与语言各向异性、评估覆盖范围及修改规模相关的开放性问题。我们的分析整合了一个迅速发展的领域并为未来在可编辑的语言感知LLM(大型语言模型)方面的发展奠定了基础。
https://arxiv.org/abs/2505.14393
Vision-language modeling (VLM) aims to bridge the information gap between images and natural language. Under the new paradigm of first pre-training on massive image-text pairs and then fine-tuning on task-specific data, VLM in the remote sensing domain has made significant progress. The resulting models benefit from the absorption of extensive general knowledge and demonstrate strong performance across a variety of remote sensing data analysis tasks. Moreover, they are capable of interacting with users in a conversational manner. In this paper, we aim to provide the remote sensing community with a timely and comprehensive review of the developments in VLM using the two-stage paradigm. Specifically, we first cover a taxonomy of VLM in remote sensing: contrastive learning, visual instruction tuning, and text-conditioned image generation. For each category, we detail the commonly used network architecture and pre-training objectives. Second, we conduct a thorough review of existing works, examining foundation models and task-specific adaptation methods in contrastive-based VLM, architectural upgrades, training strategies and model capabilities in instruction-based VLM, as well as generative foundation models with their representative downstream applications. Third, we summarize datasets used for VLM pre-training, fine-tuning, and evaluation, with an analysis of their construction methodologies (including image sources and caption generation) and key properties, such as scale and task adaptability. Finally, we conclude this survey with insights and discussions on future research directions: cross-modal representation alignment, vague requirement comprehension, explanation-driven model reliability, continually scalable model capabilities, and large-scale datasets featuring richer modalities and greater challenges.
视觉语言模型(VLM)旨在弥合图像和自然语言之间的信息差距。在这一新模式下,先通过大量图-文对进行预训练,然后针对特定任务的数据进行微调,在遥感领域的VLM取得了显著进展。这些模型吸收了广泛的通用知识,并在各种遥感数据分析任务中表现出色。此外,它们能够以对话的方式与用户互动。本文旨在为遥感社区提供一个关于基于两阶段范式的VLM发展的及时且全面的回顾。 具体来说,我们首先涵盖了一种远程传感领域的视觉语言模型分类:对比学习、视觉指令微调以及文本条件图像生成。对于每一类,我们都详细介绍了常用的网络架构和预训练目标。其次,我们将对现有的工作进行详尽的审查,包括基础模型及其在基于对比的学习中的任务特定适应方法,基于指令的VLM中有关体系结构升级、培训策略及模型能力的研究,以及具有代表性的下游应用的生成性基础模型。第三,我们总结了用于VLM预训练、微调和评估的数据集,并分析了它们的构建方法(包括图像来源和标题生成),以及规模和任务适应性等关键属性。最后,我们将以对未来研究方向的见解和讨论作为此次综述的结论:跨模态表示对齐、模糊需求理解、基于解释驱动的模型可靠性、持续可扩展的模型能力,以及具有更丰富模态和更大挑战的大规模数据集。
https://arxiv.org/abs/2505.14361
Plane geometry problem solving (PGPS) has recently gained significant attention as a benchmark to assess the multi-modal reasoning capabilities of large vision-language models. Despite the growing interest in PGPS, the research community still lacks a comprehensive overview that systematically synthesizes recent work in PGPS. To fill this gap, we present a survey of existing PGPS studies. We first categorize PGPS methods into an encoder-decoder framework and summarize the corresponding output formats used by their encoders and decoders. Subsequently, we classify and analyze these encoders and decoders according to their architectural designs. Finally, we outline major challenges and promising directions for future research. In particular, we discuss the hallucination issues arising during the encoding phase within encoder-decoder architectures, as well as the problem of data leakage in current PGPS benchmarks.
平面几何问题求解(PGPS)最近作为一个评估大型视觉-语言模型多模态推理能力的基准而受到了广泛关注。尽管人们对PGPS的兴趣不断增加,但研究社区仍然缺乏一个系统地总结近期PGPS工作的全面概述。为了填补这一空白,我们进行了现有PGPS研究的综述。首先,我们将PGPS方法归类到编码器-解码器框架中,并总结了它们对应的输出格式以及其编码器和解码器使用的模式。随后,根据其架构设计对这些编码器和解码器进行分类和分析。最后,我们概述了未来研究的主要挑战和发展方向。特别地,我们讨论了在编码器-解码器架构的编码阶段出现的幻觉问题,以及当前PGPS基准中的数据泄漏问题。
https://arxiv.org/abs/2505.14340
The transition from 5G networks to 6G highlights a significant demand for machine learning (ML). Deep learning models, in particular, have seen wide application in mobile networking and communications to support advanced services in emerging wireless environments, such as smart healthcare, smart grids, autonomous vehicles, aerial platforms, digital twins, and the metaverse. The rapid expansion of Internet-of-Things (IoT) devices, many with limited computational capabilities, has accelerated the development of tiny machine learning (TinyML) and resource-efficient ML approaches for cost-effective services. However, the deployment of large-scale machine learning (LargeML) solutions require major computing resources and complex management strategies to support extensive IoT services and ML-generated content applications. Consequently, the integration of TinyML and LargeML is projected as a promising approach for future seamless connectivity and efficient resource management. Although the integration of TinyML and LargeML shows abundant potential, several challenges persist, including performance optimization, practical deployment strategies, effective resource management, and security considerations. In this survey, we review and analyze the latest research aimed at enabling the integration of TinyML and LargeML models for the realization of smart services and applications in future 6G networks and beyond. The paper concludes by outlining critical challenges and identifying future research directions for the holistic integration of TinyML and LargeML in next-generation wireless networks.
从5G网络向6G的过渡凸显了对机器学习(ML)的重大需求。深度学习模型在移动网络和通信中的广泛应用,支持了新兴无线环境中如智能医疗、智能电网、自动驾驶汽车、空中平台、数字孪生以及元宇宙等高级服务的发展。物联网设备数量的迅速增长,其中许多设备计算能力有限,加速了微型机器学习(TinyML)和资源高效型机器学习方法的发展,以实现低成本的服务提供。然而,大规模机器学习(LargeML)解决方案的大规模部署需要大量的计算资源和复杂的管理策略来支持广泛的物联网服务和由机器生成的内容应用。因此,整合TinyML和LargeML被视为未来无缝连接和有效资源配置的有前途的方法。 尽管整合TinyML和LargeML显示出巨大的潜力,但仍存在许多挑战,包括性能优化、实用部署策略、有效的资源管理和安全考虑等问题。在这份调查中,我们回顾并分析了最新研究,这些研究旨在使TinyML和LargeML模型能够集成,并在未来的6G网络及以后实现智能服务与应用。论文最后概述了整合TinyML和LargeML的关键挑战,并确定了下一代无线网络中的未来研究方向。 通过上述综合方法,可以为开发更加高效、安全且适应性强的未来通信系统提供重要的指导方针。
https://arxiv.org/abs/2505.15854
Place recognition is a cornerstone of vehicle navigation and mapping, which is pivotal in enabling systems to determine whether a location has been previously visited. This capability is critical for tasks such as loop closure in Simultaneous Localization and Mapping (SLAM) and long-term navigation under varying environmental conditions. In this survey, we comprehensively review recent advancements in place recognition, emphasizing three representative methodological paradigms: Convolutional Neural Network (CNN)-based approaches, Transformer-based frameworks, and cross-modal strategies. We begin by elucidating the significance of place recognition within the broader context of autonomous systems. Subsequently, we trace the evolution of CNN-based methods, highlighting their contributions to robust visual descriptor learning and scalability in large-scale environments. We then examine the emerging class of Transformer-based models, which leverage self-attention mechanisms to capture global dependencies and offer improved generalization across diverse scenes. Furthermore, we discuss cross-modal approaches that integrate heterogeneous data sources such as Lidar, vision, and text description, thereby enhancing resilience to viewpoint, illumination, and seasonal variations. We also summarize standard datasets and evaluation metrics widely adopted in the literature. Finally, we identify current research challenges and outline prospective directions, including domain adaptation, real-time performance, and lifelong learning, to inspire future advancements in this domain. The unified framework of leading-edge place recognition methods, i.e., code library, and the results of their experimental evaluations are available at this https URL.
地点识别是车辆导航和地图绘制的核心,对于系统确定某个位置是否曾经被访问过至关重要。这种能力对诸如同时定位与建图(SLAM)中的闭环任务以及在不同环境条件下进行长期导航的任务都非常重要。在这篇综述中,我们全面回顾了近期地点识别领域的进展,并着重介绍了三种具有代表性的方法论范式:基于卷积神经网络(CNN)的方法、基于Transformer的框架和跨模态策略。 首先,我们阐明了地点识别在自主系统更广泛背景中的重要性。接着,我们追溯了基于CNN的方法的发展历程,强调它们对于鲁棒视觉描述符学习以及大规模环境中可扩展性的贡献。然后,我们考察了一类新兴的基于Transformer模型,这些模型利用自注意力机制来捕捉全局依赖关系,并能在不同场景中提供更好的泛化能力。此外,我们也讨论了跨模态方法,这类方法融合了诸如激光雷达(LiDAR)、视觉和文本描述等异质数据源,从而增强了对视角、光照和季节变化的抗干扰能力。 我们还总结了文献中广泛采用的标准数据集和评估指标。最后,我们指出现有的研究挑战,并概述未来的方向,包括领域适应性、实时性能以及终身学习,以激励该领域的未来发展。领先的地点识别方法统一框架(即代码库)及其实验评价结果可以在提供的链接上获取。 原文链接中的内容没有直接提供,如果您需要访问具体的研究资源或数据集,请根据上下文信息查找对应网址。
https://arxiv.org/abs/2505.14068
Software Quality Assurance (SQA) is critical for delivering reliable, secure, and efficient software products. The Software Quality Assurance Process aims to provide assurance that work products and processes comply with predefined provisions and plans. Recent advancements in Large Language Models (LLMs) present new opportunities to enhance existing SQA processes by automating tasks like requirement analysis, code review, test generation, and compliance checks. Simultaneously, established standards such as ISO/IEC 12207, ISO/IEC 25010, ISO/IEC 5055, ISO 9001/ISO/IEC 90003, CMMI, and TMM provide structured frameworks for ensuring robust quality practices. This paper surveys the intersection of LLM-based SQA methods and these recognized standards, highlighting how AI-driven solutions can augment traditional approaches while maintaining compliance and process maturity. We first review the foundational software quality standards and the technical fundamentals of LLMs in software engineering. Next, we explore various LLM-based SQA applications, including requirement validation, defect detection, test generation, and documentation maintenance. We then map these applications to key software quality frameworks, illustrating how LLMs can address specific requirements and metrics within each standard. Empirical case studies and open-source initiatives demonstrate the practical viability of these methods. At the same time, discussions on challenges (e.g., data privacy, model bias, explainability) underscore the need for deliberate governance and auditing. Finally, we propose future directions encompassing adaptive learning, privacy-focused deployments, multimodal analysis, and evolving standards for AI-driven software quality.
软件质量保证(SQA)对于提供可靠、安全和高效的软件产品至关重要。软件质量保证流程旨在确保工作产品和过程符合预先定义的规定和计划。大型语言模型(LLMs)的最新进展为增强现有的SQA过程带来了新的机会,通过自动化任务如需求分析、代码审查、测试生成以及合规性检查来实现这一目标。同时,ISO/IEC 12207、ISO/IEC 25010、ISO/IEC 5055、ISO 9001/ISO/IEC 90003、CMMI 和 TMM 等既定标准提供了确保稳健质量实践的结构化框架。本文对基于LLM的SQA方法与这些公认标准之间的交集进行了调查,强调了如何通过保持合规性和流程成熟度来增强传统方法的人工智能驱动解决方案。 首先,我们回顾软件质量和大型语言模型在软件工程中的技术基础。然后,我们将探讨各种基于LLM的SQA应用,包括需求验证、缺陷检测、测试生成以及文档维护等。接下来,将这些应用映射到关键的软件质量框架中,说明LLMs如何解决每个标准的具体要求和指标。经验案例研究和开源倡议展示了这些方法的实际可行性。 同时,关于数据隐私、模型偏差、可解释性等问题的讨论强调了需要有意识地进行治理和审计。最后,我们提出了未来发展方向,涵盖自适应学习、隐私保护部署、多模态分析以及为AI驱动软件质量而进化的标准。
https://arxiv.org/abs/2505.13766