For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners, and these requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while maintaining the rest. We define this problem as continual forgetting and identify three key challenges. (i) For unwanted knowledge, efficient and effective deleting is crucial. (ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal. (iii) In real-world scenarios, the training samples may be scarce or partially missing during the process of forgetting. To address them, we first propose Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we introduce LoRA modules to fine-tune the FFN layers in Transformer blocks for each forgetting task independently, and towards (ii), a simple group sparse regularization is adopted, enabling automatic selection of specific LoRA groups and zeroing out the others. To further extend GS-LoRA to more practical scenarios, we incorporate prototype information as additional supervision and introduce a more practical approach, GS-LoRA++. For each forgotten class, we move the logits away from its original prototype. For the remaining classes, we pull the logits closer to their respective prototypes. We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that our method manages to forget specific classes with minimal impact on other classes. Codes have been released on this https URL.
出于隐私和安全方面的考虑,从预训练的视觉模型中删除不需要的信息的需求变得越来越明显。在现实场景中,用户和模型所有者随时都可能提出擦除请求,并且这些请求通常形成一个序列。因此,在这种设置下,期望能够持续地从预训练模型中移除特定信息的同时保持其余部分不受影响。我们将这个问题定义为连续遗忘问题,并识别出三个关键挑战。(i)对于不需要的知识,高效的删除方法至关重要。(ii)对于保留下来的知识,遗忘过程带来的负面影响应该最小化。(iii)在现实场景中,在遗忘过程中可用的训练样本可能非常有限或不完整。 为了应对这些挑战,我们首先提出了组稀疏LoRA(GS-LoRA)。具体来说,针对(i),我们引入了用于独立微调Transformer块中的FFN层的LoRA模块,并且对于(ii),采用了简单的组稀疏正则化方法,从而能够自动选择特定的LoRA组并将其他部分置零。为了将GS-LoRA进一步扩展到更多实际场景中使用,我们将原型信息作为额外监督引入,并提出了一种更实用的方法——GS-LoRA++。对于每个被遗忘的类别,我们将其logits远离其原始原型;而对于剩余的类别,则吸引它们各自的原型。我们在人脸识别、目标检测和图像分类上进行了广泛的实验,证明我们的方法能够以最小影响从特定类中进行遗忘操作。 代码已经在以下网址发布:[此链接处应填写实际提供的GitHub或相关代码存储库URL]。
https://arxiv.org/abs/2501.09705
Electroencephalogram (EEG) signals have emerged as a promising modality for biometric identification. While previous studies have explored the use of imagined speech with semantically meaningful words for subject identification, most have relied on additional visual or auditory cues. In this study, we introduce a cueless EEG-based imagined speech paradigm, where subjects imagine the pronunciation of semantically meaningful words without any external cues. This innovative approach addresses the limitations of prior methods by requiring subjects to select and imagine words from a predefined list naturally. The dataset comprises over 4,350 trials from 11 subjects across five sessions. We assess a variety of classification methods, including traditional machine learning techniques such as Support Vector Machines (SVM) and XGBoost, as well as time-series foundation models and deep learning architectures specifically designed for EEG classification, such as EEG Conformer and Shallow ConvNet. A session-based hold-out validation strategy was employed to ensure reliable evaluation and prevent data leakage. Our results demonstrate outstanding classification accuracy, reaching 97.93%. These findings highlight the potential of cueless EEG paradigms for secure and reliable subject identification in real-world applications, such as brain-computer interfaces (BCIs).
脑电图(EEG)信号已作为生物识别的一种有前途的模式出现。尽管先前的研究已经探索了使用具有语义意义词汇的想象语言来进行身份识别,但大多数研究依赖于额外的视觉或听觉线索。在这项研究中,我们引入了一种无提示的基于 EEG 的想象言语范式,在这种范式下受试者在没有任何外部提示的情况下想象发音具有语义意义的单词。这种方法通过要求受试者自然地从预定义列表中选择和想象词汇来解决先前方法的局限性。 该数据集包含来自 11 名受试者的超过 4,350 次试验,这些试验分布在五个会话内。我们评估了一系列分类方法,包括传统的机器学习技术(如支持向量机SVM 和 XGBoost),以及专门用于 EEG 分类的时间序列基础模型和深度学习架构(例如 EEG Conformer 和 Shallow ConvNet)。采用基于会话的保留验证策略来确保可靠的评估并防止数据泄漏。我们的研究结果展示了出色的分类准确率,达到了 97.93%。 这些发现突显了无提示 EEG 范式在实际应用中的潜在价值,如用于安全和可靠的身份识别的脑机接口(BCIs)。
https://arxiv.org/abs/2501.09700
Hallucination remains a major challenge for Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) has gained increasing attention as a simple solution to hallucination issues. It directly learns from constructed preference pairs that reflect the severity of hallucinations in responses to the same prompt and image. Nonetheless, different data construction methods in existing works bring notable performance variations. We identify a crucial factor here: outcomes are largely contingent on whether the constructed data aligns on-policy w.r.t the initial (reference) policy of DPO. Theoretical analysis suggests that learning from off-policy data is impeded by the presence of KL-divergence between the updated policy and the reference policy. From the perspective of dataset distribution, we systematically summarize the inherent flaws in existing algorithms that employ DPO to address hallucination issues. To alleviate the problems, we propose On-Policy Alignment (OPA)-DPO framework, which uniquely leverages expert feedback to correct hallucinated responses and aligns both the original and expert-revised responses in an on-policy manner. Notably, with only 4.8k data, OPA-DPO achieves an additional reduction in the hallucination rate of LLaVA-1.5-7B: 13.26% on the AMBER benchmark and 5.39% on the Object-Hal benchmark, compared to the previous SOTA algorithm trained with 16k samples.
幻觉仍然是大型视觉语言模型(LVLM)面临的主要挑战之一。直接偏好优化(DPO)作为一种简单的解决方案,近年来受到了越来越多的关注,它通过从反映同一提示和图像的响应中幻觉严重程度所构建的偏好对进行直接学习。然而,现有的工作中的不同数据构建方法带来了显著的性能差异。我们在这里识别了一个关键因素:结果在很大程度上取决于所构建的数据是否与DPO最初的(参考)策略一致。理论上分析表明,从离策略数据学习会受到更新后的策略和参考策略之间存在的KL散度的影响。 从数据集分布的角度来看,我们系统地总结了现有算法使用DPO解决幻觉问题时固有的缺陷。为了解决这些问题,我们提出了在政策对齐(OPA)-DPO框架,它利用专家反馈来纠正幻觉响应,并以在策略的方式对准原始和经过专家修订的响应。值得注意的是,在仅使用4.8k数据的情况下,与先前使用的16k样本训练的最佳现有算法相比,OPA-DPO使LLaVA-1.5-7B模型在AMBER基准测试中实现了幻觉率额外降低13.26%,在Object-Hal基准测试中降低了5.39%。
https://arxiv.org/abs/2501.09695
Open-Vocabulary Part Segmentation (OVPS) is an emerging field for recognizing fine-grained parts in unseen categories. We identify two primary challenges in OVPS: (1) the difficulty in aligning part-level image-text correspondence, and (2) the lack of structural understanding in segmenting object parts. To address these issues, we propose PartCATSeg, a novel framework that integrates object-aware part-level cost aggregation, compositional loss, and structural guidance from DINO. Our approach employs a disentangled cost aggregation strategy that handles object and part-level costs separately, enhancing the precision of part-level segmentation. We also introduce a compositional loss to better capture part-object relationships, compensating for the limited part annotations. Additionally, structural guidance from DINO features improves boundary delineation and inter-part understanding. Extensive experiments on Pascal-Part-116, ADE20K-Part-234, and PartImageNet datasets demonstrate that our method significantly outperforms state-of-the-art approaches, setting a new baseline for robust generalization to unseen part categories.
开放词汇部分分割(OVPS)是一个新兴领域,专注于识别未见过类别中的细粒度部分。我们确定了OVPS的两个主要挑战:(1) 部分级别的图像-文本对应关系对齐困难;以及 (2) 缺乏结构化理解以划分对象部分。为了解决这些问题,我们提出了PartCATSeg框架,它集成了面向对象的部分级别成本聚合、组合损失和来自DINO的结构指导。 我们的方法采用了一种解耦的成本聚合策略,分别处理对象级和部分级别的成本,从而提高了部分分割的精度。此外,我们引入了组合损失来更好地捕捉部分与对象之间的关系,弥补了有限部分注释的问题。从DINO特征中获得的结构化指导则改善了边界划分以及部分间的理解。 在Pascal-Part-116、ADE20K-Part-234和PartImageNet数据集上的广泛实验表明,我们的方法显著优于现有最先进方法,并为未见过的部分类别提供了强大的泛化能力基准。
https://arxiv.org/abs/2501.09688
Language has long been conceived as an essential tool for human reasoning. The breakthrough of Large Language Models (LLMs) has sparked significant research interest in leveraging these models to tackle complex reasoning tasks. Researchers have moved beyond simple autoregressive token generation by introducing the concept of "thought" -- a sequence of tokens representing intermediate steps in the reasoning process. This innovative paradigm enables LLMs' to mimic complex human reasoning processes, such as tree search and reflective thinking. Recently, an emerging trend of learning to reason has applied reinforcement learning (RL) to train LLMs to master reasoning processes. This approach enables the automatic generation of high-quality reasoning trajectories through trial-and-error search algorithms, significantly expanding LLMs' reasoning capacity by providing substantially more training data. Furthermore, recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can further significantly boost reasoning accuracy. Therefore, the train-time and test-time scaling combined to show a new research frontier -- a path toward Large Reasoning Model. The introduction of OpenAI's o1 series marks a significant milestone in this research direction. In this survey, we present a comprehensive review of recent progress in LLM reasoning. We begin by introducing the foundational background of LLMs and then explore the key technical components driving the development of large reasoning models, with a focus on automated data construction, learning-to-reason techniques, and test-time scaling. We also analyze popular open-source projects at building large reasoning models, and conclude with open challenges and future research directions.
长期以来,人们一直认为语言是人类推理的必要工具。大型语言模型(LLMs)的重大突破激发了利用这些模型来解决复杂推理任务的研究兴趣。研究人员已经超越了简单的自回归令牌生成,引入了“思维”这一概念——一系列代表推理过程中间步骤的令牌序列。这种创新的方法使LLM能够模仿复杂的类人类推理过程,如树搜索和反思思考。最近,一种新兴的学习推理趋势是应用强化学习(RL)来训练LLM掌握推理过程。这种方法通过试错算法实现了高质量推理轨迹的自动生成,并且通过提供大量训练数据显著扩展了LLM的推理能力。此外,近期研究表明,在测试时鼓励LLMs使用更多令牌进行“思考”可以进一步大幅提高推理准确性。因此,结合训练时间和测试时间上的扩展展示了一个新的研究前沿——通向大型推理模型的道路。OpenAI推出的o1系列标志着这一研究方向的一个重要里程碑。在这份综述中,我们将介绍近期在LLM推理方面的重大进展。我们首先引入LLMs的基础背景知识,然后探讨驱动大规模推理模型发展的关键技术组件,重点在于自动化数据构建、学习推理技术以及测试时间扩展。此外,我们还会分析一些热门的开源项目在构建大型推理模型中的应用,并最终提出开放性挑战和未来研究方向。
https://arxiv.org/abs/2501.09686
This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models. While diffusion models are renowned for their generative modeling capabilities, practical applications in fields such as biology often require sample generation that maximizes specific metrics (e.g., stability, affinity in proteins, closeness to target structures). In these scenarios, diffusion models can be adapted not only to generate realistic samples but also to explicitly maximize desired measures at inference time without fine-tuning. This tutorial explores the foundational aspects of such inference-time algorithms. We review these methods from a unified perspective, demonstrating that current techniques -- such as Sequential Monte Carlo (SMC)-based guidance, value-based sampling, and classifier guidance -- aim to approximate soft optimal denoising processes (a.k.a. policies in RL) that combine pre-trained denoising processes with value functions serving as look-ahead functions that predict from intermediate states to terminal rewards. Within this framework, we present several novel algorithms not yet covered in the literature. Furthermore, we discuss (1) fine-tuning methods combined with inference-time techniques, (2) inference-time algorithms based on search algorithms such as Monte Carlo tree search, which have received limited attention in current research, and (3) connections between inference-time algorithms in language models and diffusion models. The code of this tutorial on protein design is available at this https URL
这篇教程提供了关于推理时引导和对齐方法的深入指南,这些方法用于优化扩散模型中的下游奖励函数。虽然扩散模型因其生成建模能力而闻名,但在生物学等领域中的实际应用通常需要生成最大化特定指标(例如蛋白质的稳定性、亲和力以及接近目标结构的程度)的样本。在这些场景中,可以对扩散模型进行调整,使其不仅能生成逼真的样本,还能在推理时明确地最大化所需的度量值而不需微调。本教程探讨了此类推理时间算法的基础方面,并从统一的角度回顾这些方法,表明当前的技术——如基于序列蒙特卡洛(SMC)的引导、基于价值的采样以及分类器引导——旨在近似软优化去噪过程(即RL中的策略),该过程结合了预训练的去噪过程和作为预测函数的价值功能,从中间状态到最终奖励。在此框架内,我们提出了一些尚未在文献中被涵盖的新算法。 此外,本教程还讨论了: 1. 结合推理时间技术的微调方法; 2. 基于搜索算法(如蒙特卡洛树搜索)的推理时间算法,在当前研究中受到了较少关注;以及 3. 语言模型与扩散模型之间在推理时间算法上的联系。 有关蛋白质设计教程代码,请访问此链接:[https URL]
https://arxiv.org/abs/2501.09685
Designing efficient quantum circuits that leverage quantum advantage compared to classical computing has become increasingly critical. Genetic algorithms have shown potential in generating such circuits through artificial evolution. However, integrating quantum advantage into the fitness function of these algorithms remains unexplored. In this paper, we aim to enhance the efficiency of quantum circuit design by proposing two novel approaches for incorporating quantum advantage metrics into the fitness function of genetic algorithms.1 We evaluate our approaches based on the Bernstein-Vazirani Problem and the Unstructured Database Search Problem as test cases. The results demonstrate that our approaches not only improve the convergence speed of the genetic algorithm but also produce circuits comparable to expert-designed solutions. Our findings suggest that automated quantum circuit design using genetic algorithms that incorporate a measure of quantum advantage is a promising approach to accelerating the development of quantum algorithms.
设计高效利用量子优势的量子电路已成为越来越关键的任务。遗传算法在通过人工进化生成此类电路方面展示了潜在的能力。然而,将量子优势集成到这些算法的适应度函数中仍然是一个未被探索的研究领域。本文旨在通过提出两种新的方法来增强量子电路的设计效率,这两种新方法可以将量子优势指标纳入遗传算法的适应度函数中。我们基于伯恩斯坦-瓦兹拉尼问题(Bernstein-Vazirani Problem)和无结构数据库搜索问题作为测试案例来评估我们的方法。实验结果表明,我们的方法不仅提高了遗传算法的收敛速度,而且还生成了与专家设计解决方案相当的电路。我们的发现表明,使用将量子优势度量纳入适应度函数的遗传算法进行自动化量子电路设计是一种加速开发量子算法的有前途的方法。
https://arxiv.org/abs/2501.09682
With the number of people with disabilities (PWD) increasing worldwide each year, the demand for mobility support to enable independent living and social integration is also growing. Wheelchairs commonly support the mobility of PWD in both indoor and outdoor environments. However, current powered wheelchairs (PWC) often fail to meet the needs of PWD, who may find it difficult to operate them. Furthermore, existing research on robotic wheelchairs typically focuses either on full autonomy or enhanced manual control, which can lead to reduced efficiency and user trust. To address these issues, this paper proposes a Robot Operating System (ROS)-based smart wheelchair, called CoNav Chair, that incorporates a shared control navigation algorithm and obstacle avoidance to support PWD while fostering efficiency and trust between the robot and the user. Our design consists of hardware and software components. Experimental results conducted in a typical indoor social environment demonstrate the performance and effectiveness of the smart wheelchair hardware and software design. This integrated design promotes trust and autonomy, which are crucial for the acceptance of assistive mobility technologies in the built environment.
随着全球残疾人士(PWD)数量的逐年增加,对支持独立生活和社会融入的移动性需求也在增长。轮椅通常在室内和室外环境中帮助PWD实现行动能力。然而,现有的电动轮椅(PWC)常常无法满足PWD的需求,他们可能会发现操作这些设备很困难。此外,关于机器人轮椅的研究大多集中在完全自主或增强的手动控制上,这可能导致效率降低和用户信任度下降。 为了应对这些问题,本文提出了一种基于Robot Operating System (ROS)的智能轮椅——CoNav Chair。该智能轮椅结合了共享控制导航算法和避障功能,以支持PWD的同时提高机器人与用户的效率和信任度。我们的设计包括硬件和软件两部分组件。在典型室内社交环境中进行的实验结果表明,智能轮椅的硬件和软件设计方案具有良好的性能和有效性。 这种集成设计促进了用户对机器人的信任和自主性,这对在建筑环境中接受辅助移动技术至关重要。
https://arxiv.org/abs/2501.09680
The rapid deployment of autonomous AI agents creates urgent challenges around authorization, accountability, and access control in digital spaces. New standards are needed to know whom AI agents act on behalf of and guide their use appropriately, protecting online spaces while unlocking the value of task delegation to autonomous agents. We introduce a novel framework for authenticated, authorized, and auditable delegation of authority to AI agents, where human users can securely delegate and restrict the permissions and scope of agents while maintaining clear chains of accountability. This framework builds on existing identification and access management protocols, extending OAuth 2.0 and OpenID Connect with agent-specific credentials and metadata, maintaining compatibility with established authentication and web infrastructure. Further, we propose a framework for translating flexible, natural language permissions into auditable access control configurations, enabling robust scoping of AI agent capabilities across diverse interaction modalities. Taken together, this practical approach facilitates immediate deployment of AI agents while addressing key security and accountability concerns, working toward ensuring agentic AI systems perform only appropriate actions and providing a tool for digital service providers to enable AI agent interactions without risking harm from scalable interaction.
自主AI代理的快速部署在授权、问责制和访问控制方面为数字空间带来了紧迫挑战。需要新的标准来确定AI代理代表谁行事,并指导它们的适当使用,以保护在线空间并解锁将任务委托给自主代理的价值。我们介绍了一种新型框架,用于将经过认证、授权和可审计的权限委派给AI代理,使人类用户能够安全地向代理授予和限制权限范围,同时保持清晰的责任链。此框架建立在现有的身份验证和访问管理协议之上,并通过为特定代理扩展OAuth 2.0和OpenID Connect的身份验证凭证和元数据来维护与现有认证和网络基础设施的兼容性。 此外,我们还提出了一种将灵活、自然语言权限转换为可审计的访问控制配置框架,使AI代理的能力范围能够在各种交互模式下保持稳健。总体而言,这种方法旨在在解决关键的安全性和问责制问题的同时实现AI代理的即时部署,并且努力确保具有自主性的AI系统仅执行适当的操作,并提供一种工具,使数字服务提供商能够安全地启用与AI代理的互动而不会因可扩展性互动带来风险。
https://arxiv.org/abs/2501.09674
The proliferation of Vision-Language Models (VLMs) in the past several years calls for rigorous and comprehensive evaluation methods and benchmarks. This work analyzes existing VLM evaluation techniques, including automated metrics, AI-based assessments, and human evaluations across diverse tasks. We first introduce Robin - a novel suite of VLMs that we built by combining Large Language Models (LLMs) and Vision Encoders (VEs) at multiple scales, and use Robin to identify shortcomings of current evaluation approaches across scales. Next, to overcome the identified limitations, we introduce CHIRP - a new long form response benchmark we developed for more robust and complete VLM evaluation. We provide open access to the Robin training code, model suite, and CHIRP benchmark to promote reproducibility and advance VLM research.
近年来,视觉-语言模型(VLMs)的迅速发展需要严格的和全面的评估方法及基准。本文分析了现有的VLM评估技术,包括自动化指标、基于AI的评估以及跨不同任务的人类评价。首先,我们介绍了Robin——一个新型的VLM套件,它是通过在多个尺度上结合大规模语言模型(LLMs)和视觉编码器(VEs)构建而成,并使用Robin来识别当前评估方法在各个尺度上的不足之处。接下来,为了克服这些已发现的局限性,我们引入了CHIRP——一个新的长形式响应基准,旨在为VLM提供更稳健且全面的评价。我们提供了Robin的训练代码、模型套件以及CHIRP基准测试的开放访问权限,以促进可重复性和推动VLM的研究进展。
https://arxiv.org/abs/2501.09672
Autonomous docking remains one of the most challenging maneuvers in marine robotics, requiring precise control and robust perception in confined spaces. This paper presents a novel approach integrating Model Predictive Path Integral(MPPI) control with real-time LiDAR-based dock detection for autonomous surface vessel docking. Our framework uniquely combines probabilistic trajectory optimization with a multiobjective cost function that simultaneously considers docking precision, safety constraints, and motion efficiency. The MPPI controller generates optimal trajectories by intelligently sampling control sequences and evaluating their costs based on dynamic clearance requirements, orientation alignment, and target position objectives. We introduce an adaptive dock detection pipeline that processes LiDAR point clouds to extract critical geometric features, enabling real-time updates of docking parameters. The proposed method is extensively validated in a physics-based simulation environment that incorporates realistic sensor noise, vessel dynamics, and environmental constraints. Results demonstrate successful docking from various initial positions while maintaining safe clearances and smooth motion characteristics.
自主对接仍然是海洋机器人技术中最具挑战性的操作之一,要求在狭小空间内进行精确控制和稳健感知。本文提出了一种新颖的方法,将模型预测路径积分(MPPI)控制与实时LiDAR-based船坞检测相结合,用于自主水面船舶的靠泊。我们的框架独特地结合了概率轨迹优化和一个多目标成本函数,同时考虑了对接精度、安全约束以及运动效率。 MPPI控制器通过智能抽样控制序列并根据动态避碰要求、方向对齐及目标位置目标来评估其成本,从而生成最优轨迹。我们引入了一种自适应船坞检测流水线,该流程处理LiDAR点云以提取关键几何特征,使对接参数能够在实时中更新。 所提出的方法在物理基础的仿真环境中进行了广泛的验证,该环境包括了现实传感器噪声、船舶动力学以及环境约束等要素。结果表明,从各种初始位置成功实现靠泊,并且保持安全距离和流畅运动特性。
https://arxiv.org/abs/2501.09668
The recent rise in the popularity of large language models has spurred the development of extensive code datasets needed to train them. This has left limited code available for collection and use in the downstream investigation of specific behaviors, or evaluation of large language models without suffering from data contamination. To address this problem, we release The Heap, a large multilingual dataset covering 57 programming languages that has been deduplicated with respect to other open datasets of code, enabling researchers to conduct fair evaluations of large language models without significant data cleaning overhead.
近期,大型语言模型流行度的上升推动了训练这些模型所需的广泛代码数据集的发展。这导致可用于特定行为下游调查或在不遭受数据污染的情况下评估大型语言模型的有效代码资源变得稀缺。为解决这一问题,我们发布了The Heap,这是一个涵盖了57种编程语言的大规模多语言数据集,并且已经与其他公开的代码数据集进行了去重处理,使研究人员能够进行公平地评估大型语言模型,而无需承受显著的数据清理负担。
https://arxiv.org/abs/2501.09653
Online motion planning is a challenging problem for intelligent robots moving in dense environments with dynamic obstacles, e.g., crowds. In this work, we propose a novel approach for optimal and safe online motion planning with minimal information about dynamic obstacles. Specifically, our approach requires only the current position of the obstacles and their maximum speed, but it does not need any information about their exact trajectories or dynamic model. The proposed methodology combines Monte Carlo Tree Search (MCTS), for online optimal planning via model simulations, with Velocity Obstacles (VO), for obstacle avoidance. We perform experiments in a cluttered simulated environment with walls, and up to 40 dynamic obstacles moving with random velocities and directions. With an ablation study, we show the key contribution of VO in scaling up the efficiency of MCTS, selecting the safest and most rewarding actions in the tree of simulations. Moreover, we show the superiority of our methodology with respect to state-of-the-art planners, including Non-linear Model Predictive Control (NMPC), in terms of improved collision rate, computational and task performance.
在线运动规划是智能机器人在密集环境中(如人群)移动时面临的一个挑战,尤其是在存在动态障碍物的情况下。在这项工作中,我们提出了一种新颖的方法来实现仅需最少的动态障碍物信息就能进行最优且安全的在线运动规划。具体而言,我们的方法只需要当前障碍物的位置及其最大速度的信息,并不需要有关它们的确切轨迹或动力学模型的数据。 该方法结合了蒙特卡洛树搜索(MCTS)和速度障碍物(Velocity Obstacles, VO)。其中,MCTS通过模拟模型来进行在线最优规划,而VO用于避免碰撞。我们在一个充满墙壁和其他多达40个随机移动的动态障碍物的仿真环境中进行了实验。通过消融研究,我们展示了VO在扩展MCTS效率中的关键作用,它可以在模拟树中选择最安全且最具奖励性的动作。 此外,我们的方法在碰撞率、计算性能和任务执行效果方面均优于现有的先进规划器,包括非线性模型预测控制(NMPC)。
https://arxiv.org/abs/2501.09649
In many real-world applications, agents must make sequential decisions in environments where conditions are subject to change due to various exogenous factors. These non-stationary environments pose significant challenges to traditional decision-making models, which typically assume stationary dynamics. Non-stationary Markov decision processes (NS-MDPs) offer a framework to model and solve decision problems under such changing conditions. However, the lack of standardized benchmarks and simulation tools has hindered systematic evaluation and advance in this field. We present NS-Gym, the first simulation toolkit designed explicitly for NS-MDPs, integrated within the popular Gymnasium framework. In NS-Gym, we segregate the evolution of the environmental parameters that characterize non-stationarity from the agent's decision-making module, allowing for modular and flexible adaptations to dynamic environments. We review prior work in this domain and present a toolkit encapsulating key problem characteristics and types in NS-MDPs. This toolkit is the first effort to develop a set of standardized interfaces and benchmark problems to enable consistent and reproducible evaluation of algorithms under non-stationary conditions. We also benchmark six algorithmic approaches from prior work on NS-MDPs using NS-Gym. Our vision is that NS-Gym will enable researchers to assess the adaptability and robustness of their decision-making algorithms to non-stationary conditions.
在许多现实世界的应用中,代理必须在一个条件会因各种外生因素而变化的环境中做出一系列决策。这种非平稳环境对传统假设为动态不变的经典决策模型提出了重大挑战。非平稳马尔可夫决策过程(NS-MDPs)提供了一种建模和解决此类条件下决策问题的框架。然而,缺乏标准化的基准测试和模拟工具阻碍了该领域的系统评估和进展。 我们介绍了 NS-Gym,这是第一个专门为 NS-MDP 设计的仿真工具包,并且它被整合到了流行的 Gymnasium 框架中。在 NS-Gym 中,我们将描述环境非平稳性特征的参数变化与代理决策模块分离开来,从而允许对动态环境进行模块化和灵活地适应。 我们回顾了此前的工作并介绍了一个包含 NS-MDP 关键问题特性及类型的工具包。这个工具包是第一个努力开发一系列标准化接口和基准测试问题以实现非平稳条件下算法的一致性和可重复性评估的尝试。我们也使用 NS-Gym 对六种先前文献中提出的关于 NS-MDP 的算法方法进行了基准测试。 我们的愿景是,NS-Gym 将使研究人员能够评估其决策制定算法在面对非平稳条件时的适应能力和鲁棒性。
https://arxiv.org/abs/2501.09646
In today's assistant landscape, personalisation enhances interactions, fosters long-term relationships, and deepens engagement. However, many systems struggle with retaining user preferences, leading to repetitive user requests and disengagement. Furthermore, the unregulated and opaque extraction of user preferences in industry applications raises significant concerns about privacy and trust, especially in regions with stringent regulations like Europe. In response to these challenges, we propose a long-term memory system for voice assistants, structured around predefined categories. This approach leverages Large Language Models to efficiently extract, store, and retrieve preferences within these categories, ensuring both personalisation and transparency. We also introduce a synthetic multi-turn, multi-session conversation dataset (CarMem), grounded in real industry data, tailored to an in-car voice assistant setting. Benchmarked on the dataset, our system achieves an F1-score of .78 to .95 in preference extraction, depending on category granularity. Our maintenance strategy reduces redundant preferences by 95% and contradictory ones by 92%, while the accuracy of optimal retrieval is at .87. Collectively, the results demonstrate the system's suitability for industrial applications.
在当今的助手生态系统中,个性化增强了互动,促进了长期关系,并加深了用户的参与度。然而,许多系统难以保留用户偏好,导致重复性的用户请求和失去兴趣。此外,在诸如欧洲等监管严格的地区,行业应用中不规范且不透明地提取用户偏好的做法引发了关于隐私和信任的重大担忧。 为应对这些挑战,我们提出了一种基于预定义类别的语音助手长期记忆系统。该方法利用大型语言模型高效地从这些类别中提取、存储和检索偏好信息,确保个性化的同时也保持透明度。此外,我们还引入了一个合成的多轮对话数据集(CarMem),这个数据集以真实行业数据为基础,并针对车载语音助理场景进行了定制。 在这一数据集上的评估结果显示,我们的系统根据不同类别的详细程度,在偏好提取方面取得了F1值从0.78到0.95的成绩。通过我们的维护策略,重复的偏好减少了95%,矛盾的偏好减少了92%;而最优检索精度为0.87。总体而言,这些结果展示了该系统的工业应用潜力和适用性。
https://arxiv.org/abs/2501.09645
The pivotal shift from traditional paper-based records to sophisticated Electronic Health Records (EHR), enabled systematic collection and analysis of patient data through descriptive statistics, providing insight into patterns and trends across patient populations. This evolution continued toward predictive analytics, allowing healthcare providers to anticipate patient outcomes and potential complications before they occur. This progression from basic digital record-keeping to sophisticated predictive modelling and digital twins reflects healthcare's broader evolution toward more integrated, patient-centred approaches that combine data-driven insights with personalized care delivery. This chapter explores the evolution and significance of healthcare information systems, beginning with an examination of the implementation of EHR in the UK and the USA. It provides a comprehensive overview of the International Classification of Diseases (ICD) system, tracing its development from ICD-9 to ICD-10. Central to this discussion is the MIMIC-III database, a landmark achievement in healthcare data sharing and arguably the most comprehensive critical care database freely available to researchers worldwide. MIMIC-III has democratized access to high-quality healthcare data, enabling unprecedented opportunities for research and analysis. The chapter examines its structure, clinical outcome analysis capabilities, and practical applications through case studies, with a particular focus on mortality and length of stay metrics, vital signs extraction, and ICD coding. Through detailed entity-relationship diagrams and practical examples, the text illustrates MIMIC's complex data structure and demonstrates how different querying approaches can lead to subtly different results, emphasizing the critical importance of understanding the database's architecture for accurate data extraction.
从传统的纸质记录向复杂的电子健康记录(EHR)转变的关键性变化,使得通过描述性统计方法系统地收集和分析患者数据成为可能,从而揭示了不同人群中的模式和趋势。这种演变进一步向着预测性分析发展,使医疗服务提供者能够提前预知患者的结局和潜在并发症。从基本的数字记录管理到复杂的预测建模以及数字孪生的发展,反映了医疗保健更广泛的向更加集成、以患者为中心的方法转变,这种方法结合了数据驱动的洞察力和个人化护理交付。 本章探讨了医疗信息系统的演变及其意义,从英国和美国实施EHR开始。它还提供了关于国际疾病分类(ICD)系统的一个全面概述,并追溯其从ICD-9到ICD-10的发展历程。在这其中的核心是MIMIC-III数据库,这是医疗数据共享领域的一项里程碑式成就,也是目前世界上免费提供给研究人员的最全面的重症监护数据库之一。MIMIC-III使高质量的医疗数据获取民主化,并为研究和分析提供了前所未有的机会。本章考察了它的结构、临床结果分析能力以及通过案例研究的实际应用情况,特别关注死亡率和住院时间指标、生命体征提取以及ICD编码。 通过详细的实体关系图和实际示例,该文本展示了MIMIC复杂的数据结构,并说明了不同的查询方法如何会导致细微但重要的不同结果,强调理解数据库架构对于准确数据提取的重要性。
https://arxiv.org/abs/2501.09640
Face recognition technology has dramatically transformed the landscape of security, surveillance, and authentication systems, offering a user-friendly and non-invasive biometric solution. However, despite its significant advantages, face recognition systems face increasing threats from physical and digital spoofing attacks. Current research typically treats face recognition and attack detection as distinct classification challenges. This approach necessitates the implementation of separate models for each task, leading to considerable computational complexity, particularly on devices with limited resources. Such inefficiencies can stifle scalability and hinder performance. In response to these challenges, this paper introduces an innovative unified model designed for face recognition and detection of physical and digital attacks. By leveraging the advanced Swin Transformer backbone and incorporating HiLo attention in a convolutional neural network framework, we address unified face recognition and spoof attack detection more effectively. Moreover, we introduce augmentation techniques that replicate the traits of physical and digital spoofing cues, significantly enhancing our model robustness. Through comprehensive experimental evaluation across various datasets, we showcase the effectiveness of our model in unified face recognition and spoof detection. Additionally, we confirm its resilience against unseen physical and digital spoofing attacks, underscoring its potential for real-world applications.
面部识别技术已显著改变了安全、监控和认证系统的格局,提供了一种用户友好且非侵入性的生物特征解决方案。然而,尽管其具有明显的优势,但面部识别系统面临着来自物理和数字伪造攻击的日益增加的威胁。目前的研究通常将面部识别与攻击检测视为两个独立的分类挑战。这种方法需要为每个任务实施单独的模型,导致计算复杂性大幅增加,尤其是在资源有限的设备上。这种低效会限制可扩展性并阻碍性能。 为了应对这些挑战,本文介绍了一种创新的一体化模型,用于面部识别和物理及数字攻击检测。通过利用先进的Swin Transformer骨干网,并在卷积神经网络框架中融入HiLo注意力机制,我们更有效地解决了统一的面部识别和伪造攻击检测问题。此外,我们引入了增强技术来复制物理和数字伪造线索的特点,大大增强了模型的鲁棒性。 通过跨多种数据集进行全面实验评估,我们展示了我们的模型在统一面部识别和伪造检测方面的有效性。另外,我们也确认了该模型对未见过的物理及数字伪造攻击具有抗御能力,突显其在实际应用中的潜力。
https://arxiv.org/abs/2501.09635
Planning for autonomous systems typically requires reasoning with models at different levels of abstraction, and the harmonization of two competing sets of objectives: high-level mission goals that refer to an interaction of the system with the external environment, and low-level platform constraints that aim to preserve the integrity and the correct interaction of the subsystems. The complicated interplay between these two models makes it very hard to reason on the system as a whole, especially when the objective is to find plans with robustness guarantees, considering the non-deterministic behavior of the lower layers of the system. In this paper, we introduce the problem of Platform-Aware Mission Planning (PAMP), addressing it in the setting of temporal durative actions. The PAMP problem differs from standard temporal planning for its exists-forall nature: the high-level plan dealing with mission goals is required to satisfy safety and executability constraints, for all the possible non-deterministic executions of the low-level model of the platform and the environment. We propose two approaches for solving PAMP. The first baseline approach amalgamates the mission and platform levels, while the second is based on an abstraction-refinement loop that leverages the combination of a planner and a verification engine. We prove the soundness and completeness of the proposed approaches and validate them experimentally, demonstrating the importance of heterogeneous modeling and the superiority of the technique based on abstraction-refinement.
自主系统规划通常需要在不同抽象层次上使用模型进行推理,并协调两个相互竞争的目标集:高层次的任务目标,这些目标涉及到系统的外部环境交互;以及低层次的平台约束,旨在保护子系统的完整性和正确交互。这两者之间的复杂互动使得整体系统难以分析,尤其是在寻找具备鲁棒性保证的计划时,这需要考虑系统底层不确定行为的影响。 本文介绍了基于时态持续动作设置下的平台感知任务规划(Platform-Aware Mission Planning, PAMP)问题,并提出了解决该问题的两种方法。PAMP问题不同于标准的时间规划之处在于其存在全称性质:处理任务目标的高层次计划必须满足所有可能的低层次模型非确定性执行的安全性和可执行约束。 我们提出了两种解决PAMP的方法。第一种基础方法是将任务和平台层次合并,而第二种则基于抽象细化循环,利用了规划器与验证引擎相结合的技术。我们证明了这两种方法的有效性和完备性,并通过实验对其进行了验证,展示了异构建模的重要性以及基于抽象细化技术的优越性。 总的来说,这项研究强调了在自主系统中处理高层次任务目标和低层次平台约束之间复杂关系时,采用适当的方法和技术来保证规划的鲁棒性和可行性至关重要。
https://arxiv.org/abs/2501.09632
As artificial intelligence (AI) becomes increasingly embedded in healthcare delivery, this chapter explores the critical aspects of developing reliable and ethical Clinical Decision Support Systems (CDSS). Beginning with the fundamental transition from traditional statistical models to sophisticated machine learning approaches, this work examines rigorous validation strategies and performance assessment methods, including the crucial role of model calibration and decision curve analysis. The chapter emphasizes that creating trustworthy AI systems in healthcare requires more than just technical accuracy; it demands careful consideration of fairness, explainability, and privacy. The challenge of ensuring equitable healthcare delivery through AI is stressed, discussing methods to identify and mitigate bias in clinical predictive models. The chapter then delves into explainability as a cornerstone of human-centered CDSS. This focus reflects the understanding that healthcare professionals must not only trust AI recommendations but also comprehend their underlying reasoning. The discussion advances in an analysis of privacy vulnerabilities in medical AI systems, from data leakage in deep learning models to sophisticated attacks against model explanations. The text explores privacy-preservation strategies such as differential privacy and federated learning, while acknowledging the inherent trade-offs between privacy protection and model performance. This progression, from technical validation to ethical considerations, reflects the multifaceted challenges of developing AI systems that can be seamlessly and reliably integrated into daily clinical practice while maintaining the highest standards of patient care and data protection.
随着人工智能(AI)在医疗保健领域应用的日益广泛,本章探讨了开发可靠且符合伦理规范的临床决策支持系统(CDSS)的关键方面。从传统的统计模型过渡到复杂的机器学习方法开始,本书详细研究了严格的验证策略和性能评估方法,包括模型校准和决策曲线分析等关键角色。本章强调,在医疗保健中创建值得信赖的人工智能系统不仅仅需要技术上的准确性,还需要谨慎考虑公平性、可解释性和隐私保护。 确保通过AI实现公正的医疗服务是一项挑战,本章节讨论了识别和缓解临床预测模型中的偏见的方法。随后,本书深入探讨了可解释性作为以人类为中心CDSS的核心要素。这种关注反映了对医疗保健专业人员不仅需要信任AI建议而且必须理解其背后的逻辑这一认识。 接下来,本书分析了医学人工智能系统中的隐私漏洞问题,从深度学习模型的数据泄露到针对模型解释的复杂攻击手段。文中探讨了包括差异隐私和联邦学习在内的隐私保护策略,并承认在隐私保护与模型性能之间存在固有的权衡关系。 这一从技术验证到伦理考虑的过程反映了开发能够在日常临床实践中无缝且可靠地集成的人工智能系统所面临的多方面挑战,同时保持最高标准的患者护理和数据保护。
https://arxiv.org/abs/2501.09628
Recent advances in large language models (LLMs) have demonstrated significant progress in performing complex tasks. While Reinforcement Learning from Human Feedback (RLHF) has been effective in aligning LLMs with human preferences, it is susceptible to spurious correlations in reward modeling. Consequently, it often introduces biases-such as length bias, sycophancy, conceptual bias, and discrimination that hinder the model's ability to capture true causal relationships. To address this, we propose a novel causal reward modeling approach that integrates causal inference to mitigate these spurious correlations. Our method enforces counterfactual invariance, ensuring reward predictions remain consistent when irrelevant variables are altered. Through experiments on both synthetic and real-world datasets, we show that our approach mitigates various types of spurious correlations effectively, resulting in more reliable and fair alignment of LLMs with human preferences. As a drop-in enhancement to the existing RLHF workflow, our causal reward modeling provides a practical way to improve the trustworthiness and fairness of LLM finetuning.
最近在大型语言模型(LLM)方面取得的进展已经证明了其在执行复杂任务方面的显著进步。虽然基于人类反馈的强化学习(RLHF)在将LLM与人类偏好对齐方面非常有效,但它容易受到奖励建模中的虚假相关性的困扰。这通常会导致诸如长度偏见、阿谀奉承倾向、概念偏差和歧视等偏见问题,这些都阻碍了模型捕捉真正因果关系的能力。为了解决这些问题,我们提出了一种新颖的因果奖励建模方法,该方法整合了因果推理来减轻这些虚假相关性。我们的方法强制执行反事实不变性,确保在无关变量改变时,奖励预测仍然保持一致。通过在合成和真实世界数据集上的实验,我们展示了我们的方法能够有效地缓解各种类型的虚假相关性,从而使得LLM与人类偏好的对齐更加可靠和公平。作为现有RLHF工作流程的即插即用增强功能,我们的因果奖励建模提供了一种实用的方法来提高LLM微调过程中的可信度和公平性。
https://arxiv.org/abs/2501.09620