Q-learning methods are widely used in robot path planning but often face challenges of inefficient search and slow convergence. We propose an Improved Q-learning (IQL) framework that enhances standard Q-learning in two significant ways. First, we introduce the Path Adaptive Collaborative Optimization (PACO) algorithm to optimize Q-table initialization, providing better initial estimates and accelerating learning. Second, we incorporate a Utility-Controlled Heuristic (UCH) mechanism with dynamically tuned parameters to optimize the reward function, enhancing the algorithm's accuracy and effectiveness in path-planning tasks. Extensive experiments in three different raster grid environments validate the superior performance of our IQL framework. The results demonstrate that our IQL algorithm outperforms existing methods, including FIQL, PP-QL-based CPP, DFQL, and QMABC algorithms, in terms of path-planning capabilities.
Q-learning方法在机器人路径规划中被广泛使用,但常常面临搜索效率低和收敛速度慢的挑战。我们提出了一种改进型Q学习(IQL)框架,在两个重要方面提升了标准Q学习的方法。首先,我们引入了路径自适应协同优化(PACO)算法来优化Q表的初始化,提供更好的初始估计值并加速学习过程。其次,我们整合了一个动态调整参数的效用控制启发式(UCH)机制以优化奖励函数,从而提高算法在路径规划任务中的准确性和有效性。 我们在三种不同的栅格环境中进行了广泛的实验,验证了我们的IQL框架的优越性能。结果表明,在路径规划能力方面,我们的IQL算法优于现有方法,包括FIQL、基于PP-QL的CPP、DFQL和QMABC算法。
https://arxiv.org/abs/2501.05411
Recent advances in digital pathology have demonstrated the effectiveness of foundation models across diverse applications. In this report, we present a novel vision foundation model based on the RudolfV approach. Our model was trained on a dataset comprising 1.2 million histopathology whole slide images, collected from two medical institutions: Mayo Clinic and Charité - Universtätsmedizin Berlin. Comprehensive evaluations show that our model achieves state-of-the-art performance across twenty-one public benchmark datasets, even though it is neither the largest model by parameter count nor by training dataset size.
最近在数字病理学领域的进展展示了基础模型在各种应用中的有效性。在这份报告中,我们提出了一种基于RudolfV方法的新型视觉基础模型。我们的模型是在一个包含120万张组织病理学全景图像的数据集上训练出来的,这些数据来自两家医疗机构:梅奥诊所和柏林夏里特大学医学院。全面的评估显示,尽管在参数数量或训练数据集规模上并非最大,但该模型在二十一个公开基准测试数据集中均达到了最先进的性能水平。
https://arxiv.org/abs/2501.05409
Modern deep learning (DL) workloads increasingly use complex deep reinforcement learning (DRL) algorithms that generate training data within the learning loop. This results in programs with several nested loops and dynamic data dependencies between tensors. While DL systems with eager execution support such dynamism, they lack the optimizations and smart scheduling of graph-based execution. Graph-based execution, however, cannot express dynamic tensor shapes, instead requiring the use of multiple static subgraphs. Either execution model for DRL thus leads to redundant computation, reduced parallelism, and less efficient memory management. We describe TimeRL, a system for executing dynamic DRL programs that combines the dynamism of eager execution with the whole-program optimizations and scheduling of graph-based execution. TimeRL achieves this by introducing the declarative programming model of recurrent tensors, which allows users to define dynamic dependencies as intuitive recurrence equations. TimeRL translates recurrent tensors into a polyhedral dependence graph (PDG) with dynamic dependencies as symbolic expressions. Through simple PDG transformations, TimeRL applies whole-program optimizations, such as automatic vectorization, incrementalization, and operator fusion. The PDG also allows for the computation of an efficient program-wide execution schedule, which decides on buffer deallocations, buffer donations, and GPU/CPU memory swapping. We show that TimeRL executes current DRL algorithms up to 47$\times$ faster than existing DRL systems, while using 16$\times$ less GPU peak memory.
现代深度学习(DL)工作负载越来越多地采用复杂的深度强化学习(DRL)算法,这些算法在学习循环中生成训练数据。这导致程序具有多个嵌套循环和张量之间的动态数据依赖关系。尽管支持即时执行的DL系统可以处理这种动态性,但它们缺乏基于图执行系统的优化和智能调度功能。然而,基于图的执行模型无法表达动态张量形状,相反,它需要使用多个静态子图。因此,无论是哪种执行模式,在DRL中都会导致冗余计算、减少并行性和更不高效的内存管理。 我们介绍了TimeRL系统,该系统用于执行动态DRL程序,结合了即时执行的灵活性和基于图执行的整体程序优化及调度功能。通过引入递归张量的声明式编程模型,TimeRL允许用户以直观的方式使用递推方程定义动态依赖关系。然后将递归张量转换为多面体依赖图(PDG),其中动态依赖关系表示为符号表达式。 通过简单的PDG变换,TimeRL可以应用整体程序优化,如自动向量化、增量计算和操作融合等。此外,PDG还允许计算一个高效的全局执行计划,决定缓冲区释放、捐赠以及GPU/CPU内存交换等操作。 实验结果显示,与现有的DRL系统相比,TimeRL能将当前的DRL算法执行速度提高至47倍,并且使用的GPU峰值内存减少了16倍。
https://arxiv.org/abs/2501.05408
We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo simulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP1 and SP2 parallel-RISC supercomputers. We have obtained promising initial results in applying this algorithm to the domain of backgammon. Results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an extremely strong multi-layer neural network. In each case, the Monte-Carlo algorithm gives a substantial reduction, by as much as a factor of 5 or more, in the error rate of the base players. The algorithm is also potentially useful in many other adaptive control applications in which it is possible to simulate the environment.
我们提出了一种用于自适应控制器实时策略改进的蒙特卡洛模拟算法。在该蒙特卡洛模拟中,统计测量每一种可能行动的长期预期奖励,使用初始政策来做出每一模拟步骤中的决策。然后选择能够最大化所测得的预期奖励的动作,从而形成改进后的策略。我们的算法易于并行化,并已在IBM SP1和SP2并行RISC超级计算机上实现。在国际跳棋领域应用此算法后取得了令人鼓舞的初步成果。我们对多种不同的初始政策进行了结果报告,范围从随机政策到TD-Gammon(一种极为强大的多层神经网络)。在这每种情况下,蒙特卡洛算法都显著降低了基础玩家的错误率,最多可减少5倍或更多。该算法在许多其他可以模拟环境的自适应控制应用中也具有潜在的实用性。
https://arxiv.org/abs/2501.05407
Time series generation models are crucial for applications like data augmentation and privacy preservation. Most existing time series generation models are typically designed to generate data from one specified domain. While leveraging data from other domain for better generalization is proved to work in other application areas, this approach remains challenging for time series modeling due to the large divergence in patterns among different real world time series categories. In this paper, we propose a multi-domain time series diffusion model with domain prompts, named TimeDP. In TimeDP, we utilize a time series semantic prototype module which defines time series prototypes to represent time series basis, each prototype vector serving as "word" representing some elementary time series feature. A prototype assignment module is applied to extract the extract domain specific prototype weights, for learning domain prompts as generation condition. During sampling, we extract "domain prompt" with few-shot samples from the target domain and use the domain prompts as condition to generate time series samples. Experiments demonstrate that our method outperforms baselines to provide the state-of-the-art in-domain generation quality and strong unseen domain generation capability.
时间序列生成模型对于数据增强和隐私保护等应用至关重要。现有的大多数时间序列生成模型通常被设计用于从特定领域生成数据。尽管在其他应用程序中,利用来自不同领域的数据以实现更好的泛化能力已被证明有效,但在时间序列建模中,由于不同的现实世界时间序列类别之间模式的显著差异,这种方法仍然具有挑战性。 本文提出了一种使用领域提示的多域时间序列扩散模型,命名为TimeDP。在TimeDP中,我们采用了一个时间序列语义原型模块,该模块定义了代表时间序列基础的时间序列原型,每个原型向量作为“词”,表示某些基本的时间序列特征。一个原型分配模块被用来提取特定领域的原型权重,以学习生成条件的领域提示。在采样过程中,我们从目标领域中使用少量样本抽取“领域提示”并将其用作生成时间序列样本的条件。 实验表明,与基线方法相比,我们的方法不仅提供了最先进的域内生成质量,还具备强大的未见领域生成能力。
https://arxiv.org/abs/2501.05403
Missing data in time-series analysis poses significant challenges, affecting the reliability of downstream applications. Imputation, the process of estimating missing values, has emerged as a key solution. This paper introduces BRATI, a novel deep-learning model designed to address multivariate time-series imputation by combining Bidirectional Recurrent Networks and Attention mechanisms. BRATI processes temporal dependencies and feature correlations across long and short time horizons, utilizing two imputation blocks that operate in opposite temporal directions. Each block integrates recurrent layers and attention mechanisms to effectively resolve long-term dependencies. We evaluate BRATI on three real-world datasets under diverse missing-data scenarios: randomly missing values, fixed-length missing sequences, and variable-length missing sequences. Our findings demonstrate that BRATI consistently outperforms state-of-the-art models, delivering superior accuracy and robustness in imputing multivariate time-series data.
在时间序列分析中,缺失数据带来了重大挑战,影响了下游应用的可靠性。填补(即估计缺失值的过程)已成为关键解决方案之一。本文介绍了一种名为BRATI的新颖深度学习模型,该模型结合双向递归网络和注意力机制,旨在解决多变量时间序列插补问题。BRATI 能够处理长短期时间依赖关系以及特征间的相关性,并利用两个在相反时间方向上运行的插补模块来实现这一点。每个模块整合了递归层和注意机制,以有效解析长期依赖关系。我们在三种实际数据集下评估 BRATI 的性能,这些数据集涵盖了不同缺失值的情况:随机丢失值、固定长度的缺失序列以及可变长度的缺失序列。我们的研究结果表明,BRATI 在多变量时间序列插补方面始终优于最先进的模型,在准确性和鲁棒性上表现更佳。
https://arxiv.org/abs/2501.05401
Safe knife practices in the kitchen significantly reduce the risk of cuts, injuries, and serious accidents during food preparation. Using YOLOv7, an advanced object detection model, this study focuses on identifying safety risks during knife handling, particularly improper finger placement and blade contact with hand. The model's performance was evaluated using metrics such as precision, recall, mAP50, and mAP50-95. The results demonstrate that YOLOv7 achieved its best performance at epoch 31, with a mAP50-95 score of 0.7879, precision of 0.9063, and recall of 0.7503. These findings highlight YOLOv7's potential to accurately detect knife-related hazards, promoting the development of improved kitchen safety.
厨房中安全使用刀具可以显著降低在食品准备过程中割伤、受伤和严重事故的风险。本研究利用YOLOv7,一种先进的物体检测模型,专注于识别刀具操作过程中的安全隐患,特别是不当的手指放置和刀刃与手的接触情况。通过精确度(precision)、召回率(recall)、mAP50以及mAP50-95等指标评估了该模型的表现。结果表明,YOLOv7在第31个周期时表现最佳,其mAP50-95得分为0.7879,准确率为0.9063,召回率为0.7503。这些发现突显了YOLOv7能够精确检测刀具相关隐患的潜力,并促进了厨房安全改进的发展。
https://arxiv.org/abs/2501.05399
Unlike human-engineered systems such as aeroplanes, where each component's role and dependencies are well understood, the inner workings of AI models remain largely opaque, hindering verifiability and undermining trust. This paper introduces SemanticLens, a universal explanation method for neural networks that maps hidden knowledge encoded by components (e.g., individual neurons) into the semantically structured, multimodal space of a foundation model such as CLIP. In this space, unique operations become possible, including (i) textual search to identify neurons encoding specific concepts, (ii) systematic analysis and comparison of model representations, (iii) automated labelling of neurons and explanation of their functional roles, and (iv) audits to validate decision-making against requirements. Fully scalable and operating without human input, SemanticLens is shown to be effective for debugging and validation, summarizing model knowledge, aligning reasoning with expectations (e.g., adherence to the ABCDE-rule in melanoma classification), and detecting components tied to spurious correlations and their associated training data. By enabling component-level understanding and validation, the proposed approach helps bridge the "trust gap" between AI models and traditional engineered systems. We provide code for SemanticLens on this https URL and a demo on this https URL.
与飞机等由人类设计的系统不同,这些系统的每个组件的作用和依赖关系都十分明确,AI模型内部的工作原理仍然很大程度上不透明,这阻碍了验证过程,并且削弱了信任。本文介绍了一种名为SemanticLens的通用神经网络解释方法,该方法将隐藏在各个组成部分(如单个神经元)中的知识映射到一个语义结构化的多模态空间中,例如CLIP等基础模型的空间内。在这个空间中,可以执行一些独特的操作,包括:(i) 文本搜索以识别编码特定概念的神经元;(ii) 系统性地分析和比较模型表示;(iii) 自动标记神经元并解释其功能角色;以及(iv) 审计以验证决策是否符合要求。SemanticLens完全可扩展且无需人工干预,被证明在调试、验证、总结模型知识、使推理与预期保持一致(例如,在恶性黑色素瘤分类中遵循ABCDE规则)等方面非常有效,并能检测到与虚假关联和其相关训练数据绑定的组件。通过实现对组件级别的理解和验证,该方法有助于弥合AI模型和传统工程系统之间的“信任差距”。我们可以在[此链接](https://example.com/code)提供SemanticLens的代码,在[此链接](https://example.com/demo)提供演示。
https://arxiv.org/abs/2501.05398
Large language models (LLMs) have demonstrated significant capability in code generation, drawing increasing attention to the evaluation of the quality and safety of their outputs. However, research on bias in code generation remains limited. Existing studies typically assess bias by applying malicious prompts or reapply tasks and dataset for discriminative models. Given that LLMs are often aligned with human values and that prior datasets are not fully optimized for code-related tasks, there is a pressing need for benchmarks specifically designed for evaluating code models. In this study, we introduce FairCode, a novel benchmark for evaluating bias in code generation. FairCode comprises two tasks: function implementation and test case generation, each evaluating social bias through diverse scenarios. Additionally, we propose a new metric, FairScore, to assess model performance on this benchmark. We conduct experiments on widely used LLMs and provide a comprehensive analysis of the results. The findings reveal that all tested LLMs exhibit bias. The code is available at this https URL.
大型语言模型(LLM)在代码生成方面表现出显著的能力,这引起了对其输出质量和安全性的评估的广泛关注。然而,关于代码生成中的偏见的研究仍然有限。现有的研究通常通过应用恶意提示或重新使用用于区分模型的任务和数据集来评估偏见。鉴于LLM通常与人类价值观相一致,并且先前的数据集并未完全优化用于代码相关任务,因此迫切需要专门设计以评估代码模型的基准。在这项研究中,我们介绍了FairCode,这是一个新的基准测试工具,旨在评估代码生成中的偏见。FairCode包括两个任务:函数实现和测试用例生成,每个任务通过多种场景来评估社会偏见。此外,我们提出了一种新指标——FairScore,用于评估模型在该基准上的性能。我们在广泛使用的LLM上进行了实验,并提供了全面的结果分析。研究发现表明,所有测试的LLM都表现出某种程度的偏见。代码可在[此链接](https://example.com)获取。(请将示例链接替换为实际提供的链接)
https://arxiv.org/abs/2501.05396
Every maneuver of a vehicle redistributes risks between road users. While human drivers do this intuitively, autonomous vehicles allow and require deliberative algorithmic risk management. But how should traffic risks be distributed among road users? In a global experimental study in eight countries with different cultural backgrounds and almost 11,000 participants, we compared risk distribution preferences. It turns out that risk preferences in road traffic are strikingly similar between the cultural zones. The vast majority of participants in all countries deviates from a guiding principle of minimizing accident probabilities in favor of weighing up the probability and severity of accidents. At the national level, the consideration of accident probability and severity hardly differs between countries. The social dilemma of autonomous vehicles detected in deterministic crash scenarios disappears in risk assessments of everyday traffic situations in all countries. In no country do cyclists receive a risk bonus that goes beyond their higher vulnerability. In sum, our results suggest that a global consensus on the risk ethics of autonomous driving is easier to establish than on the ethics of crashing.
每一次车辆的操作都会重新分配道路上各使用者的风险。尽管人类驾驶员会凭直觉进行风险再分配,自动驾驶汽车则允许并需要通过算法来进行有意识的风险管理。但交通风险应该如何在道路使用者之间分布呢?在一个涵盖八个国家、具有不同文化背景的全球实验研究中,我们对近1.1万名参与者进行了风险分配偏好的对比分析。 结果显示,在道路交通中的风险偏好在各个文化区域间非常相似。所有国家的大多数参与者都偏离了最小化事故概率的原则,而是倾向于权衡事故的概率和严重性。从国家层面来看,各国在考虑事故发生概率和严重性的方法上几乎没有差异。在确定性碰撞场景中发现的自动驾驶车辆的社会困境,在所有国家的日常交通情况风险评估中都不复存在。 没有一个国家会给予骑自行车者超出其更高脆弱性以外的风险补偿。总的来说,我们的研究结果表明,建立全球统一的自动驾驶风险伦理共识比制定碰撞事件中的道德规范要容易得多。
https://arxiv.org/abs/2501.05391
This paper explores ideas and provides a potential roadmap for the development and evaluation of physics-specific large-scale AI models, which we call Large Physics Models (LPMs). These models, based on foundation models such as Large Language Models (LLMs) - trained on broad data - are tailored to address the demands of physics research. LPMs can function independently or as part of an integrated framework. This framework can incorporate specialized tools, including symbolic reasoning modules for mathematical manipulations, frameworks to analyse specific experimental and simulated data, and mechanisms for synthesizing theories and scientific literature. We begin by examining whether the physics community should actively develop and refine dedicated models, rather than relying solely on commercial LLMs. We then outline how LPMs can be realized through interdisciplinary collaboration among experts in physics, computer science, and philosophy of science. To integrate these models effectively, we identify three key pillars: Development, Evaluation, and Philosophical Reflection. Development focuses on constructing models capable of processing physics texts, mathematical formulations, and diverse physical data. Evaluation assesses accuracy and reliability by testing and benchmarking. Finally, Philosophical Reflection encompasses the analysis of broader implications of LLMs in physics, including their potential to generate new scientific understanding and what novel collaboration dynamics might arise in research. Inspired by the organizational structure of experimental collaborations in particle physics, we propose a similarly interdisciplinary and collaborative approach to building and refining Large Physics Models. This roadmap provides specific objectives, defines pathways to achieve them, and identifies challenges that must be addressed to realise physics-specific large scale AI models.
本文探讨了一些理念,并为物理专用大规模人工智能模型的开发和评估提供了一条潜在的发展路线,我们将这些模型称为大型物理模型(LPMs)。这些基于广泛数据训练的基础模型(如大型语言模型LLMs)构建的模型,被专门设计来满足物理学研究的需求。LPMs 可以独立运作或作为综合框架的一部分运行。该框架可以整合专业的工具,包括用于数学操作的符号推理模块、分析特定实验和模拟数据的框架以及合成理论与科学文献的方法。 本文首先探讨了物理界是否应该积极开发和完善专用模型,而不仅仅是依赖商业化的LLMs。接着,我们概述了通过物理学、计算机科学和科学哲学专家之间的跨学科合作来实现LPMs的方式。为了有效地整合这些模型,我们确定了三个关键支柱:发展、评估以及反思。 - 发展专注于构建能够处理物理文本、数学表达式及多样化物理数据的模型。 - 评估则通过对模型进行测试与基准测试来衡量其准确性和可靠性。 - 反思涵盖了LLMs在物理学中的更广泛影响分析,包括它们可能产生的新科学理解以及研究中可能出现的新合作模式。 受粒子物理学实验协作组织结构的启发,我们提出了一种类似地跨学科且合作的方法来构建和改进大型物理模型。该路线图提供了具体的实现目标、达成这些目标的途径,并确定了必须解决以实现专门针对物理学的大规模AI模型所面临的挑战。
https://arxiv.org/abs/2501.05382
Inspired by the effectiveness of 3D Gaussian Splatting (3DGS) in reconstructing detailed 3D scenes within multi-view setups and the emergence of large 2D human foundation models, we introduce Arc2Avatar, the first SDS-based method utilizing a human face foundation model as guidance with just a single image as input. To achieve that, we extend such a model for diverse-view human head generation by fine-tuning on synthetic data and modifying its conditioning. Our avatars maintain a dense correspondence with a human face mesh template, allowing blendshape-based expression generation. This is achieved through a modified 3DGS approach, connectivity regularizers, and a strategic initialization tailored for our task. Additionally, we propose an optional efficient SDS-based correction step to refine the blendshape expressions, enhancing realism and diversity. Experiments demonstrate that Arc2Avatar achieves state-of-the-art realism and identity preservation, effectively addressing color issues by allowing the use of very low guidance, enabled by our strong identity prior and initialization strategy, without compromising detail.
受3D高斯点阵(3DGS)在多视角设置中重建详细3D场景的高效性和大型2D人体基础模型出现的启发,我们推出了Arc2Avatar,这是第一个基于形状分段合成(SDS)的方法,它利用单个人脸图像作为输入的人体面部基础模型进行引导。为了实现这一目标,我们在人工合成数据上对这样的模型进行了微调,并对其条件进行了修改,以适应多视角人体头部生成的需求。我们的虚拟角色与一个人类脸部网格模板保持密集对应关系,从而可以基于混合形状(blendshape)生成表情。这通过改进的3DGS方法、连通性正则化器以及专为任务定制的战略初始化来实现。 此外,我们提出了一种可选的高效SDS基线修正步骤,以细化混合形状表达,增强真实感和多样性。实验表明,Arc2Avatar在现实性和身份保持方面达到了最先进的水平,并且通过允许使用非常低的指导(由我们的强大身份先验和初始化策略启用),有效地解决了色彩问题,同时不牺牲细节。
https://arxiv.org/abs/2501.05379
Virtual Try-On (VTON) has become a crucial tool in ecommerce, enabling the realistic simulation of garments on individuals while preserving their original appearance and pose. Early VTON methods relied on single generative networks, but challenges remain in preserving fine-grained garment details due to limitations in feature extraction and fusion. To address these issues, recent approaches have adopted a dual-network paradigm, incorporating a complementary "ReferenceNet" to enhance garment feature extraction and fusion. While effective, this dual-network approach introduces significant computational overhead, limiting its scalability for high-resolution and long-duration image/video VTON applications. In this paper, we challenge the dual-network paradigm by proposing a novel single-network VTON method that overcomes the limitations of existing techniques. Our method, namely MNVTON, introduces a Modality-specific Normalization strategy that separately processes text, image and video inputs, enabling them to share the same attention layers in a VTON network. Extensive experimental results demonstrate the effectiveness of our approach, showing that it consistently achieves higher-quality, more detailed results for both image and video VTON tasks. Our results suggest that the single-network paradigm can rival the performance of dualnetwork approaches, offering a more efficient alternative for high-quality, scalable VTON applications.
虚拟试穿(Virtual Try-On,VTON)已成为电子商务领域的一项关键工具,它能够实现对个体着装的真实模拟,并保留其原有的外观和姿态。早期的VTON方法依赖于单一生成网络,但因特征提取与融合方面的局限性,在保持服装细节方面仍存在挑战。为解决这些问题,近期的方法开始采用双网络范式,引入互补的“ReferenceNet”来增强服装特征的提取和融合。尽管这种方法有效,但它带来了显著的计算开销,限制了其在高分辨率及长时间视频虚拟试穿应用中的扩展性。 本文提出了一种新颖的单网络VTON方法来挑战现有的双网络范式,并克服现有技术的局限。我们的方法称为MNVTON(Modality-specific Normalization VTON),引入了一种模态特定归一化策略,可以分别处理文本、图像和视频输入,使它们能够在一个VTON网络中共享相同的注意力层。 实验结果表明,本方法在图像及视频虚拟试穿任务中均能获得更高质量且细节更加丰富的成果。我们的研究结果显示,单网络范式能够在性能上与双网络方法相匹敌,并提供了一种更为高效的高质、可扩展的VTON应用替代方案。
https://arxiv.org/abs/2501.05369
At the risk of overstating the case, connectionist approaches to machine learning, i.e. neural networks, are enjoying a small vogue right now. However, these methods require large volumes of data and produce models that are uninterpretable to humans. An alternative framework that is compatible with neural networks and gradient-based learning, but explicitly models compositionality, is Vector Symbolic Architectures (VSAs). VSAs are a family of algebras on high-dimensional vector representations. They arose in cognitive science from the need to unify neural processing and the kind of symbolic reasoning that humans perform. While machine learning methods have benefited from category theoretical analyses, VSAs have not yet received similar treatment. In this paper, we present a first attempt at applying category theory to VSAs. Specifically, we conduct a brief literature survey demonstrating the lacking intersection of these two topics, provide a list of desiderata for VSAs, and propose that VSAs may be understood as a (division) rig in a category enriched over a monoid in Met (the category of Lawvere metric spaces). This final contribution suggests that VSAs may be generalised beyond current implementations. It is our hope that grounding VSAs in category theory will lead to more rigorous connections with other research, both within and beyond, learning and cognition.
有风险夸大其词地说,连接主义的机器学习方法——即神经网络——目前正受到一些追捧。然而,这些方法需要大量的数据,并且生成的人类难以解读的模型。与神经网络和基于梯度的学习兼容但明确建模组合性的另一种框架是向量符号架构(VSAs)。VSAs是一系列在高维向量表示上的代数系统。它们起源于认知科学,在那里它们被用来统一神经处理和人类执行的那种符号推理。 尽管机器学习方法从范畴理论分析中受益匪浅,但VSAs尚未得到类似的关注。在这篇论文中,我们首次尝试将范畴理论应用于VSAs,并进行了简要的文献综述,展示了这两个主题之间缺乏交集之处,提供了一系列表明对VSAs期望的需求清单,并提出可以将VSAs理解为在Met(Lawvere度量空间类)中单态加法幺半群上丰富化的范畴中的除环。最后这一贡献表明,VSAs可能超越目前的实现方式而得到推广。 我们希望基于范畴理论来构建VSAs能够促进与学习和认知领域内以及跨领域的研究更严格的联系。
https://arxiv.org/abs/2501.05368
Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science, mathematics, and coding, as well as six open-domain QA benchmarks, demonstrate the strong performance of Search-o1. This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems. The code is available at \url{this https URL}.
大型推理模型(LRM)如OpenAI-o1 通过大规模强化学习展示了令人印象深刻的长步骤推断能力。然而,这些模型的扩展推理过程常常因知识不足而遭受频繁的不确定性及潜在错误。为解决这一限制,我们引入了**Search-o1**框架,该框架通过代理检索增强生成(RAG)机制和Reason-in-Documents模块来改进LRM。Search-o1 将一个代理搜索工作流程整合到推理过程中,在LRM遇到不确定的知识点时能够动态地检索外部知识。此外,由于检索文档的冗长性,我们设计了一个独立的 Reason-in-Documents 模块,在将信息注入推理链之前对检索的信息进行深度分析,从而最大限度地减少噪音并保持连贯的推理流程。在科学、数学和编码领域的复杂推断任务以及六个开放领域 QA 基准上的广泛实验显示了 Search-o1 强大的性能表现。这种方法增强了 LRMs 在复杂推理任务中的可信度及适用性,为更加可靠且多功能的智能系统铺平道路。代码可在 [此链接](this https URL) 获取。
https://arxiv.org/abs/2501.05366
Corrigibility of autonomous agents is an under explored part of system design, with previous work focusing on single agent systems. It has been suggested that uncertainty over the human preferences acts to keep the agents corrigible, even in the face of human irrationality. We present a general framework for modelling corrigibility in a multi-agent setting as a 2 player game in which the agents always have a move in which they can ask the human for supervision. This is formulated as a Bayesian game for the purpose of introducing uncertainty over the human beliefs. We further analyse two specific cases. First, a two player corrigibility game, in which we want corrigibility displayed in both agents for both common payoff (monotone) games and harmonic games. Then we investigate an adversary setting, in which one agent is considered to be a `defending' agent and the other an `adversary'. A general result is provided for what belief over the games and human rationality the defending agent is required to have to induce corrigibility.
自主代理的可纠正性是系统设计中一个尚未充分探索的部分,之前的研究主要集中在单个代理系统上。有人提出,在面对人类非理性的情况下,对人类偏好的不确定性可以保持代理的可纠正性。我们在此提出了一个多代理环境下的可纠正性的通用框架,并将其建模为一个两人博弈游戏,在该游戏中,代理始终有一个动作可以让它们请求人类进行监督。为了引入关于人类信念的不确定性,我们将这一问题形式化为贝叶斯博弈。 进一步地,我们分析了两个特定案例: 1. **双人可纠正性博弈**:在这种情境下,我们希望在双方都表现出共同收益(单调)游戏和和谐游戏中均具备可纠正性的代理。这意味着,无论是在合作还是竞争环境中,代理都能根据人类的指导调整其行为。 2. **对抗设置**:在此场景中,一个代理被视作“防御方”,另一个作为“对手”。我们提供了一个关于哪些信念需要由防守方代理持有以诱导出可纠正性的一般结果。具体来说,这个一般结论涉及游戏类型和人类理性之间的关系,确定了为了保证代理能够响应并服从于人类的监督而必须持有的信念。 通过这些分析,我们可以更好地理解如何在多代理系统中设计具有自我修正能力的人工智能,以确保它们即使面临复杂的社会和技术环境也能正确地服务于人类的最佳利益。
https://arxiv.org/abs/2501.05360
With advances in diffusion models, image generation has shown significant performance improvements. This raises concerns about the potential abuse of image generation, such as the creation of explicit or violent images, commonly referred to as Not Safe For Work (NSFW) content. To address this, the Stable Diffusion model includes several safety checkers to censor initial text prompts and final output images generated from the model. However, recent research has shown that these safety checkers have vulnerabilities against adversarial attacks, allowing them to generate NSFW images. In this paper, we find that these adversarial attacks are not robust to small changes in text prompts or input latents. Based on this, we propose CROPS (Circular or RandOm Prompts for Safety), a model-agnostic framework that easily defends against adversarial attacks generating NSFW images without requiring additional training. Moreover, we develop an approach that utilizes one-step diffusion models for efficient NSFW detection (CROPS-1), further reducing computational resources. We demonstrate the superiority of our method in terms of performance and applicability.
随着扩散模型的进步,图像生成技术在性能上取得了显著提升。然而,这也引发了对可能滥用图像生成的担忧,例如创建色情或暴力内容(通常称为不适合工作场合的内容,即NSFW)。为解决这一问题,Stable Diffusion 模型包含了几种安全检查机制,用于审查初始的文字提示和模型生成的最终输出图像。然而,近期研究显示这些安全检查器容易受到对抗性攻击的影响,使得它们在生成 NSFW 图像时失效。 在这篇论文中,我们发现这些对抗性攻击对于文本提示或输入潜在变量(latents)的小变化并不稳定。基于这一观察,我们提出了一种称为 CROPS(Circular or RandOm Prompts for Safety)的模型无关框架,可以轻松地防御生成 NSFW 图像的对抗性攻击,并且无需进行额外训练即可实现。此外,我们还开发了一种利用一步扩散模型进行高效 NSFW 检测的方法(CROPS-1),进一步减少了计算资源需求。 我们的方法在性能和应用范围方面都表现出了优越性。
https://arxiv.org/abs/2501.05359
The co-design of neural network architectures, quantization precisions, and hardware accelerators offers a promising approach to achieving an optimal balance between performance and efficiency, particularly for model deployment on resource-constrained edge devices. In this work, we propose the JAQ Framework, which jointly optimizes the three critical dimensions. However, effectively automating the design process across the vast search space of those three dimensions poses significant challenges, especially when pursuing extremely low-bit quantization. Specifical, the primary challenges include: (1) Memory overhead in software-side: Low-precision quantization-aware training can lead to significant memory usage due to storing large intermediate features and latent weights for back-propagation, potentially causing memory exhaustion. (2) Search time-consuming in hardware-side: The discrete nature of hardware parameters and the complex interplay between compiler optimizations and individual operators make the accelerator search time-consuming. To address these issues, JAQ mitigates the memory overhead through a channel-wise sparse quantization (CSQ) scheme, selectively applying quantization to the most sensitive components of the model during optimization. Additionally, JAQ designs BatchTile, which employs a hardware generation network to encode all possible tiling modes, thereby speeding up the search for the optimal compiler mapping strategy. Extensive experiments demonstrate the effectiveness of JAQ, achieving approximately 7% higher Top-1 accuracy on ImageNet compared to previous methods and reducing the hardware search time per iteration to 0.15 seconds.
神经网络架构、量化精度和硬件加速器的协同设计为在性能与效率之间实现最佳平衡提供了一种有前景的方法,尤其是在资源受限的边缘设备上部署模型时。在这项工作中,我们提出了JAQ框架(Joint Architecture, Quantization and Accelerator Framework),它共同优化这三个关键维度。然而,在处理这三个维度的巨大搜索空间时,自动化设计过程面临着重大挑战,尤其是当追求极低比特量化时更是如此。具体来说,主要的挑战包括: 1. 软件端内存开销:低精度量化的感知训练可能会导致由于存储大量中间特征和隐式权重以进行反向传播而产生显著的记忆使用问题,这可能引起内存耗尽。 2. 硬件端搜索时间长:硬件参数的离散性质以及编译器优化与个别操作之间的复杂相互作用使得加速器的搜索过程非常耗费时间。 为了解决这些问题,JAQ通过通道稀疏量化(Channel-wise Sparse Quantization, CSQ)方案缓解了内存开销问题。这种方法在优化过程中有选择地将量化应用于模型中最敏感的部分。此外,JAQ设计了BatchTile机制,该机制利用硬件生成网络来编码所有可能的切片模式,从而加速最优编译器映射策略的搜索过程。 广泛的实验展示了JAQ的有效性,在ImageNet数据集上实现了比先前方法高约7%的Top-1准确率,并将每次迭代中硬件搜索的时间减少到了0.15秒。
https://arxiv.org/abs/2501.05339
The rapid advancement of large language models (LLMs) has led to significant improvements in their capabilities, but also to increased concerns about their alignment with human values and intentions. Current alignment strategies, including adaptive training and inference-time methods, have demonstrated potential in this area. However, these approaches still struggle to balance deployment complexity and capability across various tasks and difficulties. In this work, we introduce the Streaming Distribution Induce Aligner (Stream Aligner), a novel alignment paradigm that combines efficiency with enhanced performance in various tasks throughout the generation process. Stream Aligner achieves dynamic sentence-level correction by using a small model to learn the preferences of the suffix sentence, iteratively correcting the suffix sentence output by the upstream model, and then using the corrected sentence to replace the suffix sentence in subsequent generations. Compared to Aligner, our experiments demonstrate that Stream Aligner reduces reliance on the capabilities of additional models, enhances the reasoning abilities of LLMs, and decreases latency during user interaction. Specifically, Stream Aligner-2B model has achieved an improvement of 76.1% in helpfulness, 36.0% in harmlessness on the tested Llama2-70B-chat model, and Stream Aligner-8B has achieved an improvement of 3.5% on the math ability of the tested Llama3-70B-Instruct model.
大型语言模型(LLMs)的迅速发展已经显著提高了它们的能力,但也引发了关于这些模型与人类价值观和意图相一致性的担忧。目前的对齐策略,包括自适应训练和推理时间方法,在这一领域显示出了一定潜力。然而,这些方法在平衡部署复杂性和跨各种任务和难度的任务能力方面仍面临挑战。在这项工作中,我们介绍了流式分布诱导对齐器(Stream Aligner),这是一种结合了效率与生成过程中多种任务性能增强的新颖对齐范式。Stream Aligner 通过使用一个小模型来学习后缀句子的偏好,在迭代中纠正上游模型输出的后缀句子,并用修正后的句子替换后续生成中的后缀句子,实现了动态句级校正。 相比Aligner,我们的实验表明,Stream Aligner 减少了对额外模型能力的依赖,增强了LLMs 的推理能力,并在用户交互期间降低了延迟。具体来说,在测试的Llama2-70B-chat 模型中,Stream Aligner-2B 模型实现了76.1% 的有用性改进和36.0% 的无害性改进;而在测试的Llama3-70B-Instruct 模型中,Stream Aligner-8B 实现了数学能力方面3.5% 的提升。
https://arxiv.org/abs/2501.05336
We study strategic location choice by customers and sellers, termed the Bakers and Millers Game in the literature. In our generalized setting, each miller can freely choose any location for setting up a mill, while each baker is restricted in the choice of location for setting up a bakery. For optimal bargaining power, a baker would like to select a location with many millers to buy flour from and with little competition from other bakers. Likewise, a miller aims for a location with many bakers and few competing millers. Thus, both types of agents choose locations to optimize the ratio of agents of opposite type divided by agents of the same type at their chosen location. Originally raised in the context of Fractional Hedonic Games, the Bakers and Millers Game has applications that range from commerce to product design. We study the impact of location restrictions on the properties of the game. While pure Nash equilibria trivially exist in the setting without location restrictions, we show via a sophisticated, efficient algorithm that even the more challenging restricted setting admits equilibria. Moreover, the computed equilibrium approximates the optimal social welfare by a factor of at most $2\left(\frac{e}{e-1}\right)$. Furthermore, we give tight bounds on the price of anarchy/stability. On the conceptual side, the location choice feature adds a new layer to the standard setting of Hedonic Games, in the sense that agents that select the same location form a coalition. This allows to naturally restrict the possible coalitions that can be formed. With this, our model generalizes simple symmetric Fractional Hedonic Games on complete bipartite valuation graphs and also Hedonic Diversity Games with utilities single-peaked at 0. We believe that this generalization is also a very interesting direction for other types of Hedonic Games.
我们在文献中研究了顾客和卖家的战略选址选择,称之为烘焙师与磨粉工博弈(Bakers and Millers Game)。在我们的一般化设定中,每个磨粉工可以自由选择任何位置来设立磨坊,而每个烘焙师则受限于他们可以选择的位置来设立面包店。为了获得最佳的议价能力,一个烘焙师会选择那些有许多磨粉工可供购买面粉并且几乎没有其他竞争性烘焙师的位置。同样地,一位磨粉工也会选择那些有许多烘焙师但竞争对手较少的位置。因此,这两类参与者都会选择位置以优化他们所选位置上相反类型代理的数量与相同类型代理数量之比。 最初在分数欢乐游戏(Fractional Hedonic Games)的背景下提出,烘焙师与磨粉工博弈的应用范围从商业到产品设计不等。我们研究了位置限制对游戏特性的影响。尽管在一个没有位置限制的情景下纯纳什均衡显然存在,但我们通过一个复杂且高效的算法展示,在更加具有挑战性的受限情景中也存在着这样的均衡点。此外,计算出的均衡接近最优社会福利的一个因子为最多 $2\left(\frac{e}{e-1}\right)$。 除此之外,我们还给出了价格混乱度/稳定度的紧致界值。从概念上看,位置选择功能向标准的欢乐游戏(Hedonic Games)设置中添加了一层新的内容,也就是说,选择相同位置的代理形成一个联盟。这自然地限制了可以形成的可能联盟类型。因此,我们的模型推广了在完全双部分估值图上的简单对称分数欢乐博弈以及具有0单峰效用的欢乐多样性游戏。 我们相信这种泛化对于其他类型的欢乐游戏来说也是一个非常有趣的探索方向。
https://arxiv.org/abs/2501.05334