Bundle adjustment (BA) is a critical technique in various robotic applications, such as simultaneous localization and mapping (SLAM), augmented reality (AR), and photogrammetry. BA optimizes parameters such as camera poses and 3D landmarks to align them with observations. With the growing importance of deep learning in perception systems, there is an increasing need to integrate BA with deep learning frameworks for enhanced reliability and performance. However, widely-used C++-based BA frameworks, such as GTSAM, g$^2$o, and Ceres, lack native integration with modern deep learning libraries like PyTorch. This limitation affects their flexibility, adaptability, ease of debugging, and overall implementation efficiency. To address this gap, we introduce an eager-mode BA framework seamlessly integrated with PyPose, providing PyTorch-compatible interfaces with high efficiency. Our approach includes GPU-accelerated, differentiable, and sparse operations designed for 2nd-order optimization, Lie group and Lie algebra operations, and linear solvers. Our eager-mode BA on GPU demonstrates substantial runtime efficiency, achieving an average speedup of 18.5$\times$, 22$\times$, and 23$\times$ compared to GTSAM, g$^2$o, and Ceres, respectively.
捆绑调整(BA)是一种关键的技术,在各种机器人应用中都有广泛的应用,如同时定位与映射(SLAM)、增强现实(AR)和摄影测量。BA优化参数,如相机姿态和3D地标,以与观测值对齐。随着深度学习在感知系统中的重要性不断增加,越来越多的需要将BA与深度学习框架集成以提高可靠性和性能。然而,广泛使用的基于C++的BA框架,如GTSAM、g$^2$o和Ceres,与现代深度学习库(如PyTorch)的本地集成缺乏。这一限制影响了它们的灵活性、适应性、调试难度和整体实现效率。为了填补这一空白,我们引入了一个与PyPose无缝集成的 eager-mode BA框架,提供与PyTorch兼容的接口,具有高效率。我们的方法包括为2阶优化、Lie组和Lie代数运算以及线性求解设计的GPU加速、可导和稀疏操作。我们的 eager-mode BA在GPU 上具有实质性的运行效率,与 GTSAM、g$^2$o 和 Ceres 分别相比,平均速度提升为 18.5$\times$、22$\times$ 和 23$\times$。
https://arxiv.org/abs/2409.12190
Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules. However, sparse computation challenges traditional training practices, as discrete expert routing hinders standard backpropagation and thus gradient-based optimization, which are the cornerstone of deep learning. To better pursue the scaling power of MoE, we introduce GRIN (GRadient-INformed MoE training), which incorporates sparse gradient estimation for expert routing and configures model parallelism to avoid token dropping. Applying GRIN to autoregressive language modeling, we develop a top-2 16$\times$3.8B MoE model. Our model, with only 6.6B activated parameters, outperforms a 7B dense model and matches the performance of a 14B dense model trained on the same data. Extensive evaluations across diverse tasks demonstrate the potential of GRIN to significantly enhance MoE efficacy, achieving 79.4 on MMLU, 83.7 on HellaSwag, 74.4 on HumanEval, and 58.9 on MATH.
混合专家(MoE)模型比密集模型更有效地扩展,因为专家路由稀疏计算,仅激活一小部分专家模块。然而,稀疏计算挑战了传统的训练实践,因为离散的专家路由阻碍了标准的反向传播,从而使得基于梯度的优化成为深度学习的重要组成部分。为了更好地追求MoE的扩展能力,我们引入了GRIN(GRadient-INformed MoE训练),它包含了专家路由的稀疏梯度估计,并配置模型并行度以避免词丢弃。将GRIN应用于自回归语言建模,我们开发了一个前2 16$\times$3.8B MoE模型。我们的模型仅激活6.6B个参数,却表现出了比7B个参数的密集模型更好的性能,并达到了与在相同数据上训练的14B个参数的模型相匹配的性能。在多样任务的大规模评估中,GRIN证明了显著增强MoE效果的潜力,在MMLU上取得了79.4,在HellaSwag上取得了83.7,在HumanEval上取得了74.4,在MATH上取得了58.9的分数。
https://arxiv.org/abs/2409.12136
Teams of mobile [aerial, ground, or aquatic] robots have applications in resource delivery, patrolling, information-gathering, agriculture, forest fire fighting, chemical plume source localization and mapping, and search-and-rescue. Robot teams traversing hazardous environments -- with e.g. rough terrain or seas, strong winds, or adversaries capable of attacking or capturing robots -- should plan and coordinate their trails in consideration of risks of disablement, destruction, or capture. Specifically, the robots should take the safest trails, coordinate their trails to cooperatively achieve the team-level objective with robustness to robot failures, and balance the reward from visiting locations against risks of robot losses. Herein, we consider bi-objective trail-planning for a mobile team of robots orienteering in a hazardous environment. The hazardous environment is abstracted as a directed graph whose arcs, when traversed by a robot, present known probabilities of survival. Each node of the graph offers a reward to the team if visited by a robot (which e.g. delivers a good to or images the node). We wish to search for the Pareto-optimal robot-team trail plans that maximize two [conflicting] team objectives: the expected (i) team reward and (ii) number of robots that survive the mission. A human decision-maker can then select trail plans that balance, according to their values, reward and robot survival. We implement ant colony optimization, guided by heuristics, to search for the Pareto-optimal set of robot team trail plans. As a case study, we illustrate with an information-gathering mission in an art museum.
移动机器人团队具有在资源交付、巡逻、信息收集、农业、森林防火、化学浓烟源定位和绘制以及搜救中的应用。在具有例如崎岖不平的地形、强风或能够攻击或捕捉机器人的敌人等危险环境的机器人团队中,应该规划并协调它们的路线,考虑残疾、破坏或被俘虏的风险。具体来说,机器人应选择最安全的路线,将路线协调为在机器人故障的情况下实现团队目标,并平衡访问地点的奖励与机器人损失的风险。本文我们考虑在危险环境中为移动机器人团队进行双目标规划。危险环境被抽象为一个有向图,当机器人穿过时,路径上的每个节点已知生存概率。每个节点为团队提供奖励(例如,交付货物或图像节点)。我们试图寻找具有两个(相互冲突)团队目标的Pareto最优机器人团队路线计划:预期团队奖励和预期幸存机器人数量。然后,一个人类决策者可以根据他们的价值观选择路线计划,平衡奖励和机器人生存。我们采用蚁群优化,受到启发,搜索具有最优机器人团队路线计划的Pareto最优集合。 例如,我们以在艺术博物馆进行信息收集任务为例进行说明。
https://arxiv.org/abs/2409.12114
In practical use cases, polygonal mesh editing can be faster than generating new ones, but it can still be challenging and time-consuming for users. Existing solutions for this problem tend to focus on a single task, either geometry or novel view synthesis, which often leads to disjointed results between the mesh and view. In this work, we propose LEMON, a mesh editing pipeline that combines neural deferred shading with localized mesh optimization. Our approach begins by identifying the most important vertices in the mesh for editing, utilizing a segmentation model to focus on these key regions. Given multi-view images of an object, we optimize a neural shader and a polygonal mesh while extracting the normal map and the rendered image from each view. By using these outputs as conditioning data, we edit the input images with a text-to-image diffusion model and iteratively update our dataset while deforming the mesh. This process results in a polygonal mesh that is edited according to the given text instruction, preserving the geometric characteristics of the initial mesh while focusing on the most significant areas. We evaluate our pipeline using the DTU dataset, demonstrating that it generates finely-edited meshes more rapidly than the current state-of-the-art methods. We include our code and additional results in the supplementary material.
在实际应用场景中,多边形网格编辑可能比生成新网格更快,但仍然对用户来说具有挑战性和耗时。解决这个问题现有解决方案通常集中于单一任务,无论是几何还是 novel view synthesis,往往导致网格和视图之间的割裂结果。在本文中,我们提出了LEMON,一个结合神经延迟光照和局部网格优化的高效网格编辑管道。我们的方法首先通过分割模型确定网格中最重要的顶点,并利用这个模型集中处理这些关键区域。对于一个物体的多视角图像,我们优化神经渲染器和多边形网格,同时提取每个视图的法线贴图和渲染图像。通过将这些输出作为条件数据,我们使用文本到图像扩散模型对输入图像进行编辑,并迭代更新我们的数据集同时扭曲网格。这种过程导致根据给定文本指令编辑的多边形网格,保留初始网格的几何特征,同时关注最重要的部分。我们使用DTU数据集评估我们的管道,结果表明它比现有方法更快速地生成细编辑的网格。我们在补充材料中包括我们的代码和其他结果。
https://arxiv.org/abs/2409.12024
To address the intricate challenges of decentralized cooperative scheduling and motion planning in Autonomous Mobility-on-Demand (AMoD) systems, this paper introduces LMMCoDrive, a novel cooperative driving framework that leverages a Large Multimodal Model (LMM) to enhance traffic efficiency in dynamic urban environments. This framework seamlessly integrates scheduling and motion planning processes to ensure the effective operation of Cooperative Autonomous Vehicles (CAVs). The spatial relationship between CAVs and passenger requests is abstracted into a Bird's-Eye View (BEV) to fully exploit the potential of the LMM. Besides, trajectories are cautiously refined for each CAV while ensuring collision avoidance through safety constraints. A decentralized optimization strategy, facilitated by the Alternating Direction Method of Multipliers (ADMM) within the LMM framework, is proposed to drive the graph evolution of CAVs. Simulation results demonstrate the pivotal role and significant impact of LMM in optimizing CAV scheduling and enhancing decentralized cooperative optimization process for each vehicle. This marks a substantial stride towards achieving practical, efficient, and safe AMoD systems that are poised to revolutionize urban transportation. The code is available at this https URL.
为解决自主移动需求系统(AMoD)中分布式协作调度和运动规划的复杂挑战,本文引入了LMMCoDrive,一种新颖的协作驾驶框架,利用大型多模态模型(LMM)在动态城市环境中提高交通效率。该框架将调度和运动规划过程无缝集成,确保协作自动驾驶车辆(CAVs)的有效运行。将CAV与乘客需求的地理关系抽象成鸟瞰图(BEV),充分发掘LMM的潜力。此外,在确保碰撞避免的安全约束条件下,为每个CAV精细优化轨迹。 在LMM框架内,由交替方向乘子法(ADMM)推动的分布式优化策略被提出,以驱动CAV的图进化。仿真结果表明,LMM在优化CAV调度和提高每个车辆的分布式合作优化过程方面具有关键作用和重大影响。这标志着朝着实现实用、高效和安全的AMoD系统迈出了重要一步,这些系统有潜力彻底颠覆城市交通。代码可在此处访问:https://www.url.
https://arxiv.org/abs/2409.11981
Reactive collision avoidance is essential for agile robots navigating complex and dynamic environments, enabling real-time obstacle response. However, this task is inherently challenging because it requires a tight integration of perception, planning, and control, which traditional methods often handle separately, resulting in compounded errors and delays. This paper introduces a novel approach that unifies these tasks into a single reactive framework using solely onboard sensing and computing. Our method combines nonlinear model predictive control with adaptive control barrier functions, directly linking perception-driven constraints to real-time planning and control. Constraints are determined by using a neural network to refine noisy RGB-D data, enhancing depth accuracy, and selecting points with the minimum time-to-collision to prioritize the most immediate threats. To maintain a balance between safety and agility, a heuristic dynamically adjusts the optimization process, preventing overconstraints in real time. Extensive experiments with an agile quadrotor demonstrate effective collision avoidance across diverse indoor and outdoor environments, without requiring environment-specific tuning or explicit mapping.
主动避障对于敏捷机器人穿越复杂和动态环境至关重要,实现实时障碍物反应。然而,这项任务固有挑战性,因为它需要集成感知、规划和控制,而传统方法通常分别处理这些问题,导致累积误差和延迟。本文提出了一种将这三项任务统一到一个反应性框架中的新方法,仅利用车载感知和计算。我们的方法将非线性模型预测控制与自适应控制障碍函数相结合,直接将感知驱动的约束与实时规划和控制直接联系起来。约束是通过使用神经网络来优化嘈杂的RGB-D数据,提高深度准确性,并选择距离碰撞时间最短点的点来优先考虑最紧迫的威胁。为了保持安全和敏捷之间的平衡, Heuristic动态调整优化过程,防止在实时超约束。在敏捷四旋翼的广泛实验中,证明了在多样室内和室外环境中有效避开碰撞,而无需对环境进行特定调整或显式映射。
https://arxiv.org/abs/2409.11962
This paper investigates payload grasping from a moving platform using a hook-equipped aerial manipulator. First, a computationally efficient trajectory optimization based on complementarity constraints is proposed to determine the optimal grasping time. To enable application in complex, dynamically changing environments, the future motion of the payload is predicted using physics simulator-based models. The success of payload grasping under model uncertainties and external disturbances is formally verified through a robustness analysis method based on integral quadratic constraints. The proposed algorithms are evaluated in a high-fidelity physical simulator, and in real flight experiments using a custom-designed aerial manipulator platform.
本文研究使用带钩无人机 manipulator从移动平台上进行抓取。首先,提出了一种基于补余约束的计算效率轨迹优化方法,以确定抓取的最佳时间。为了在复杂、动态变化的环境中应用,使用基于物理仿真器的模型预测未来负载的运动。通过基于积分二次约束的鲁棒性分析方法,正式验证了模型不确定性和平衡干扰下抓取的成功。所提出的算法在高品质物理仿真器和真实飞行实验中进行了评估。
https://arxiv.org/abs/2409.11788
To mitigate the susceptibility of neural networks to adversarial attacks, adversarial training has emerged as a prevalent and effective defense strategy. Intrinsically, this countermeasure incurs a trade-off, as it sacrifices the model's accuracy in processing normal samples. To reconcile the trade-off, we pioneer the incorporation of null-space projection into adversarial training and propose two innovative Null-space Projection based Adversarial Training(NPAT) algorithms tackling sample generation and gradient optimization, named Null-space Projected Data Augmentation (NPDA) and Null-space Projected Gradient Descent (NPGD), to search for an overarching optimal solutions, which enhance robustness with almost zero deterioration in generalization performance. Adversarial samples and perturbations are constrained within the null-space of the decision boundary utilizing a closed-form null-space projector, effectively mitigating threat of attack stemming from unreliable features. Subsequently, we conducted experiments on the CIFAR10 and SVHN datasets and reveal that our methodology can seamlessly combine with adversarial training methods and obtain comparable robustness while keeping generalization close to a high-accuracy model.
为了减轻神经网络对对抗攻击的易感性,对抗训练已成为一种普遍且有效的防御策略。本质上,这一措施牺牲了模型在处理正常样本时的准确性。为解决这一权衡,我们首创了将核空间投影引入对抗训练,并提出了两种新颖的核空间投影 based adversarial training(NPAT)算法,解决样本生成和梯度优化问题,名为Null-space Projected Data Augmentation(NPDA)和Null-space Projected Gradient Descent(NPGD),以寻找一个总的最优解决方案,该方案可以在几乎不损失泛化性能的情况下提高鲁棒性。对抗样本和扰动在决策边界的核空间内受到约束,有效地减轻了攻击来自不可靠特征所带来的威胁。随后,我们在CIFAR10和SVHN数据集上进行了实验,证实了我们的方法可以与对抗训练方法无缝结合,获得与高准确度模型相当的保护性能,同时保持泛化性能接近于高准确度模型。
https://arxiv.org/abs/2409.11754
Compressed sensing (CS) has emerged to overcome the inefficiency of Nyquist sampling. However, traditional optimization-based reconstruction is slow and can not yield an exact image in practice. Deep learning-based reconstruction has been a promising alternative to optimization-based reconstruction, outperforming it in accuracy and computation speed. Finding an efficient sampling method with deep learning-based reconstruction, especially for Fourier CS remains a challenge. Existing joint optimization of sampling-reconstruction works (H1) optimize the sampling mask but have low potential as it is not adaptive to each data point. Adaptive sampling (H2) has also disadvantages of difficult optimization and Pareto sub-optimality. Here, we propose a novel adaptive selection of sampling-reconstruction (H1.5) framework that selects the best sampling mask and reconstruction network for each input data. We provide theorems that our method has a higher potential than H1 and effectively solves the Pareto sub-optimality problem in sampling-reconstruction by using separate reconstruction networks for different sampling masks. To select the best sampling mask, we propose to quantify the high-frequency Bayesian uncertainty of the input, using a super-resolution space generation model. Our method outperforms joint optimization of sampling-reconstruction (H1) and adaptive sampling (H2) by achieving significant improvements on several Fourier CS problems.
压缩感知(CS)作为一种克服了奈奎斯特采样效率低的问题的解决方案,已经得到了广泛的应用。然而,传统的基于优化的重构方法速度较慢,在实践中无法获得精确的图像。基于深度学习的重构方法已经成为优化基于重构的一个有前景的替代方案,在准确性和计算速度方面优于它。在深度学习-基于重构的压缩感知中,尤其是对于傅里叶CS,找到一个高效的采样方法仍然具有挑战性。现有的压缩感知重构工作(H1)优化了采样掩码,但潜力较低,因为它不适应每个数据点。自适应采样(H2)也存在缺点,如难以优化和帕累托次优性。在这里,我们提出了一种新的自适应采样-重构(H1.5)框架,为每个输入数据选择最佳的采样掩码和重构网络。我们提供了使用单独重构网络为不同采样掩码优化采样-重构问题的理论证明。为了选择最佳的采样掩码,我们提出了使用超分辨率空间生成模型量化输入的高频贝叶斯不确定性的方法。我们的方法在解决几个傅里叶CS问题上优于联合优化采样-重构(H1)和自适应采样(H2)。
https://arxiv.org/abs/2409.11738
Few-shot class-incremental learning is crucial for developing scalable and adaptive intelligent systems, as it enables models to acquire new classes with minimal annotated data while safeguarding the previously accumulated knowledge. Nonetheless, existing methods deal with continuous data streams in a centralized manner, limiting their applicability in scenarios that prioritize data privacy and security. To this end, this paper introduces federated few-shot class-incremental learning, a decentralized machine learning paradigm tailored to progressively learn new classes from scarce data distributed across multiple clients. In this learning paradigm, clients locally update their models with new classes while preserving data privacy, and then transmit the model updates to a central server where they are aggregated globally. However, this paradigm faces several issues, such as difficulties in few-shot learning, catastrophic forgetting, and data heterogeneity. To address these challenges, we present a synthetic data-driven framework that leverages replay buffer data to maintain existing knowledge and facilitate the acquisition of new knowledge. Within this framework, a noise-aware generative replay module is developed to fine-tune local models with a balance of new and replay data, while generating synthetic data of new classes to further expand the replay buffer for future tasks. Furthermore, a class-specific weighted aggregation strategy is designed to tackle data heterogeneity by adaptively aggregating class-specific parameters based on local models performance on synthetic data. This enables effective global model optimization without direct access to client data. Comprehensive experiments across three widely-used datasets underscore the effectiveness and preeminence of the introduced framework.
少量样本分类递增学习对开发可扩展和自适应智能系统至关重要,因为它允许模型在稀疏注释数据的情况下积累先前的知识,同时保护先前的知识。然而,现有的方法将连续数据流集中处理,限制其在重视数据隐私和安全场景的适用性。为此,本文介绍了一种联邦少量样本分类递增学习,一种专为从分散的多个客户端的稀疏数据中逐步学习新类别的分布式机器学习范式。在这个学习范式中,客户端会根据新的类别局部更新模型,同时保留数据隐私,然后将模型更新传输到集中的服务器,在那里进行全局聚合。然而,这种范式存在几个问题,例如稀疏学习困难、灾难性遗忘和数据异质性。为解决这些问题,我们提出了一个基于合成数据驱动的框架,利用回放缓冲数据来保留现有知识,并促进对新知识的获取。在这个框架内,我们设计了一个噪声感知生成式回放模块,通过平衡新数据和回放数据,对本地模型进行微调,同时生成新类别的合成数据,进一步扩展回放缓冲区以应对未来的任务。此外,我们还设计了一个针对类的加权聚合策略,通过根据合成数据上本地模型的表现动态地聚合类特定参数,实现有效的全局模型优化,同时无需直接访问客户端数据。通过在三个广泛使用数据集上的全面实验,我们证实了所提出的框架的有效性和卓越性。
https://arxiv.org/abs/2409.11657
Metaheuristic algorithms are essential for solving complex optimization problems in different fields. However, the difficulty in comparing and rating these algorithms remains due to the wide range of performance metrics and problem dimensions usually involved. On the other hand, nonparametric statistical methods and post hoc tests are time-consuming, especially when we only need to identify the top performers among many algorithms. The Hierarchical Rank Aggregation (HRA) algorithm aims to efficiently rank metaheuristic algorithms based on their performance across many criteria and dimensions. The HRA employs a hierarchical framework that begins with collecting performance metrics on various benchmark functions and dimensions. Rank-based normalization is employed for each performance measure to ensure comparability and the robust TOPSIS aggregation is applied to combine these rankings at several hierarchical levels, resulting in a comprehensive ranking of the algorithms. Our study uses data from the CEC 2017 competition to demonstrate the robustness and efficacy of the HRA framework. It examines 30 benchmark functions and evaluates the performance of 13 metaheuristic algorithms across five performance indicators in four distinct dimensions. This presentation highlights the potential of the HRA to enhance the interpretation of the comparative advantages and disadvantages of various algorithms by simplifying practitioners' choices of the most appropriate algorithm for certain optimization problems.
元启发式算法在解决不同领域的复杂优化问题中至关重要。然而,由于通常涉及广泛的性能指标和问题维度,比较和评分这些算法仍然具有难度。另一方面,非参数统计方法和后验测试是耗时的,尤其是在我们只需要找到许多算法中的最佳表现时。分层排名聚合(HRA)算法旨在根据其在许多标准上的性能对元启发式算法进行高效排名。HRA采用了一个分层的框架,从收集各种基准函数和维度上的性能指标开始。对于每个性能指标,采用排名归一化以确保可比较性,并在多个分层级别上应用鲁棒TOPSIS聚合,从而对算法进行全面的排名。我们的研究使用2017年CEC比赛的数据显示了HRA框架的稳健性和有效性。它评估了30个基准函数,并研究了13个元启发式算法在四个不同维度上的性能。本报告强调了HRA通过简化从业者选择最适合某些优化问题来比较各种算法的优势和劣势的可能性。
https://arxiv.org/abs/2409.11617
Bounded rational agents often make decisions by evaluating a finite selection of choices, typically derived from a reference point termed the $`$default policy,' based on previous experience. However, the inherent rigidity of the static default policy presents significant challenges for agents when operating in unknown environment, that are not included in agent's prior knowledge. In this work, we introduce a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the $\textit{imagined}$ unobserved map. Furthermore, the adaptive nature of the bounded rationality framework enables the robot to manage unreliable or incorrect imaginations by selectively sampling a few trajectories in the vicinity of the default policy. Our approach utilizes a diffusion model for map prediction and a sampling-based planning with B-spline trajectory optimization to generate the default policy. Extensive evaluations reveal that the context-generative policy outperforms the baseline methods in identifying and avoiding unseen obstacles. Additionally, real-world experiments conducted with the Crazyflie drones demonstrate the adaptability of our proposed method, even when acting in environments outside the domain of the training distribution.
受约束的理性代理人在通常基于先验知识的基础上,通过评估有限的选择来做出决策。这些选择通常来自称为`默认策略`的参考点。然而,静态默认策略的固有刚性在未知环境中对代理商运作时提出了重大挑战,这些挑战并不包括代理商的先验知识。在这项工作中,我们引入了一种上下文生成默认策略,它利用机器人观察到的区域来预测未观测到的环境部分,从而使机器人能够根据实际观测到的地图和想象的未观测地图自适应地调整其默认策略。此外,有限理性框架的适应性使得机器人能够通过选择附近默认策略的少数轨迹来管理不可靠或不正确的想象。我们的方法利用了地平线模型进行地图预测和基于B-spline轨迹优化进行抽样规划来生成默认策略。大量评估显示,上下文生成策略在识别和避免未见到的障碍方面优于基线方法。此外,使用Crazyflie无人机进行的真实世界实验也证明了我们在领域外环境中的方法具有可适应性。
https://arxiv.org/abs/2409.11604
Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N) cancers is a time-consuming and experience-demanding task where a large number of planning objectives are involved. Deep reinforcement learning (DRL) has recently been introduced to the planning processes of intensity-modulated radiation therapy and brachytherapy for prostate, lung, and cervical cancers. However, existing approaches are built upon the Q-learning framework and weighted linear combinations of clinical metrics, suffering from poor scalability and flexibility and only capable of adjusting a limited number of planning objectives in discrete action spaces. We propose an automatic treatment planning model using the proximal policy optimization (PPO) algorithm and a dose distribution-based reward function for proton PBS treatment planning of H&N cancers. Specifically, a set of empirical rules is used to create auxiliary planning structures from target volumes and organs-at-risk (OARs), along with their associated planning objectives. These planning objectives are fed into an in-house optimization engine to generate the spot monitor unit (MU) values. A decision-making policy network trained using PPO is developed to iteratively adjust the involved planning objective parameters in a continuous action space and refine the PBS treatment plans using a novel dose distribution-based reward function. Proton H&N treatment plans generated by the model show improved OAR sparing with equal or superior target coverage when compared with human-generated plans. Moreover, additional experiments on liver cancer demonstrate that the proposed method can be successfully generalized to other treatment sites. To the best of our knowledge, this is the first DRL-based automatic treatment planning model capable of achieving human-level performance for H&N cancers.
Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N) cancers is a time-consuming and experience-demanding task where a large number of planning objectives are involved. Deep reinforcement learning (DRL) has recently been introduced to the planning processes of intensity-modulated radiation therapy and brachytherapy for prostate, lung, and cervical cancers. However, existing approaches are built upon the Q-learning framework and weighted linear combinations of clinical metrics, suffering from poor scalability and flexibility and only capable of adjusting a limited number of planning objectives in discrete action spaces. 我们提出了一个使用局部策略优化(PPO)算法和剂量分布为基础的奖励函数来进行质子PBS治疗计划的前列癌(H&N)肿瘤的自动治疗计划模型。具体来说,采用一系列经验规则从目标体积和器官危险区域(OARs)创建辅助规划结构,并将其与相应的规划目标连接起来。这些规划目标被输入到内部优化引擎中,生成点监测单元(MU)值。 为了迭代调整连续动作空间中涉及的规划目标参数并使用新的剂量分布为基础的奖励函数优化质子PBS治疗计划,我们开发了一个决策制定策略网络。通过训练PPO模型,我们成功地提高了OAR的保留率,在人类生成的计划中具有相同或更好的靶覆盖率。此外,在肝脏癌症的额外实验中,我们证明了所提出的方法可以在其他治疗站点上成功推广。 据我们所知,这是第一个基于DRL的自动治疗计划模型,可以在H&N癌症上实现人类水平的表现。
https://arxiv.org/abs/2409.11576
Navigating rigid body objects through crowded environments can be challenging, especially when narrow passages are presented. Existing sampling-based planners and optimization-based methods like mixed integer linear programming (MILP) formulations, suffer from limited scalability with respect to either the size of the workspace or the number of obstacles. In order to address the scalability issue, we propose a three-stage algorithm that first generates a graph of convex polytopes in the workspace free of collision, then poses a large set of small MILPs to generate viable paths between polytopes, and finally queries a pair of start and end configurations for a feasible path online. The graph of convex polytopes serves as a decomposition of the free workspace and the number of decision variables in each MILP is limited by restricting the subproblem within two or three free polytopes rather than the entire free region. Our simulation results demonstrate shorter online computation time compared to baseline methods and scales better with the size of the environment and tunnel width than sampling-based planners in both 2D and 3D environments.
在拥挤的环境中导航刚体对象可能具有挑战性,尤其是在狭窄的通道中。现有的基于采样的规划和基于优化的方法,如混合整数线性规划(MILP)的公式,在空间大小或障碍数量方面具有有限的可扩展性。为了解决可扩展性问题,我们提出了一个三阶段算法,首先在工作室中生成一个凸多面体的图,然后对小MILP提出一系列,以生成可行的路径,最后在线路上查询可行的起点和终点配置。凸多面体的图作为自由工作空间和每个MILP中的决策变量的分解。每个MILP中的决策变量的限制是在两个或三个自由多面体之间进行子问题,而不是整个自由区域。我们的仿真结果表明,与基线方法相比,在线计算时间更短,并且与环境的规模和通道宽度相比,扩展更好。
https://arxiv.org/abs/2409.11520
The growing demand for innovative research in the food industry is driving the adoption of robots in large-scale experimentation, as it offers increased precision, replicability, and efficiency in product manufacturing and evaluation. To this end, we introduce a robotic system designed to optimize food product quality, focusing on powdered cappuccino preparation as a case study. By leveraging optimization algorithms and computer vision, the robot explores the parameter space to identify the ideal conditions for producing a cappuccino with the best foam quality. The system also incorporates computer vision-driven feedback in a closed-loop control to further improve the beverage. Our findings demonstrate the effectiveness of robotic automation in achieving high repeatability and extensive parameter exploration, paving the way for more advanced and reliable food product development.
食品工业中创新研究的日益增加推动了在大型实验中采用机器人的趋势,因为这可以提高在产品制造和评估中的精度和可重复性。为此,我们介绍了一种设计用于优化食品产品质量的机器人系统,重点关注粉末拿铁制作作为案例研究。通过利用优化算法和计算机视觉,机器人探索了参数空间,以确定生产最佳泡沫质量的拿铁的最佳条件。系统还采用计算机视觉驱动的反馈进行闭环控制,进一步改进饮料。我们的研究结果表明,机器人自动化在实现高重复性和广泛的参数探索方面非常有效,为高级和可靠 food 产品开发铺平了道路。
https://arxiv.org/abs/2409.11499
The integration of artificial intelligence (AI) and optimization hold substantial promise for improving the efficiency, reliability, and resilience of engineered systems. Due to the networked nature of many engineered systems, ethically deploying methodologies at this intersection poses challenges that are distinct from other AI settings, thus motivating the development of ethical guidelines tailored to AI-enabled optimization. This paper highlights the need to go beyond fairness-driven algorithms to systematically address ethical decisions spanning the stages of modeling, data curation, results analysis, and implementation of optimization-based decision support tools. Accordingly, this paper identifies ethical considerations required when deploying algorithms at the intersection of AI and optimization via case studies in power systems as well as supply chain and logistics. Rather than providing a prescriptive set of rules, this paper aims to foster reflection and awareness among researchers and encourage consideration of ethical implications at every step of the decision-making process.
人工智能(AI)与优化相结合在提高工程系统的效率、可靠性和韧性方面具有巨大的潜力。由于许多工程系统的网络性质,在这个交叉点上伦理部署方法论带来了与其他AI设置不同的挑战,因此推动了针对AI驱动优化方法的伦理指南的开发。本文突出了需要超越基于公平性的算法,系统地解决建模、数据策展、结果分析和优化基于决策支持工具的伦理决策。因此,本文通过对电力系统和供应链物流领域的案例研究,识别了在AI和优化交叉点上部署算法时需要考虑的伦理考虑。本文并非提供一组规定性的规则,而是旨在培养研究人员的反思和意识,鼓励他们在决策过程的每个阶段都考虑伦理影响。
https://arxiv.org/abs/2409.11489
There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared to Adam, which only updates running averages of first- and second-moment quantities. This work establishes a formal connection between Shampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficient approximation of Adam -- showing that Shampoo is equivalent to running Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to the design of a simpler and computationally efficient algorithm: $\textbf{S}$hampo$\textbf{O}$ with $\textbf{A}$dam in the $\textbf{P}$reconditioner's eigenbasis (SOAP). With regards to improving Shampoo's computational efficiency, the most straightforward approach would be to simply compute Shampoo's eigendecomposition less frequently. Unfortunately, as our empirical results show, this leads to performance degradation that worsens with this frequency. SOAP mitigates this degradation by continually updating the running average of the second moment, just as Adam does, but in the current (slowly changing) coordinate basis. Furthermore, since SOAP is equivalent to running Adam in a rotated space, it introduces only one additional hyperparameter (the preconditioning frequency) compared to Adam. We empirically evaluate SOAP on language model pre-training with 360m and 660m sized models. In the large batch regime, SOAP reduces the number of iterations by over 40% and wall clock time by over 35% compared to AdamW, with approximately 20% improvements in both metrics compared to Shampoo. An implementation of SOAP is available at this https URL.
有越来越多的证据表明,Shampoo(一种高级预处理方法)在深度学习优化任务中的效果要优于Adam。然而,Shampoo的缺点是在与Adam相比时,它增加了额外的超参数和计算开销。与Adam不同,Shampoo只更新了运行平均值的第一和第二阶量。这项工作建立了Shampoo(以1/2的功率实现)和Adafactor(一种内存高效的Adam的近似)之间的正式联系,表明Shampoo等价于在Shampoo的预处理器的特征空间中运行Adafactor。这个洞察使得我们设计了一个更简单且计算效率更高的算法:SOAP(Shampoo预处理器特征空间中的Adam)。关于提高Shampoo的计算效率,最直接的途径就是计算Shampoo的离散化更少。然而,正如我们的实证结果所示,这种方法会导致性能下降,且随着频率的增加,这种下降会加剧。SOAP通过不断更新第二阶量的运行平均值,与Adam的做法相同,但在当前(缓慢变化的)坐标基中减少了唯一的一个超参数(预处理频率)。此外,由于SOAP与Adam在旋转空间中运行相同,它比Adam引入了更少的超参数(预处理频率)。我们用360M和660M大小模型的语言模型预训练实验来实证评估SOAP。在大型批量情况下,SOAP将AdamW的迭代次数降低了40%以上,并将墙时间降低了35%以上。与Shampoo相比,SOAP的性能提高了约20%。SOAP的实现可以在这个链接https://www.cs.utah.edu/~germain/soapy.pdf中找到。
https://arxiv.org/abs/2409.11321
Process-mining techniques have emerged as powerful tools for analyzing event data to gain insights into business processes. In this paper, we present a comprehensive analysis of road traffic fine management processes using the pm4py library in Python. We start by importing an event log dataset and explore its characteristics, including the distribution of activities and process variants. Through filtering and statistical analysis, we uncover key patterns and variations in the process executions. Subsequently, we apply various process-mining algorithms, including the Alpha Miner, Inductive Miner, and Heuristic Miner, to discover process models from the event log data. We visualize the discovered models to understand the workflow structures and dependencies within the process. Additionally, we discuss the strengths and limitations of each mining approach in capturing the underlying process dynamics. Our findings shed light on the efficiency and effectiveness of road traffic fine management processes, providing valuable insights for process optimization and decision-making. This study demonstrates the utility of pm4py in facilitating process mining tasks and its potential for analyzing real-world business processes.
过程挖掘技术已成为分析事件数据以深入了解业务流程的强大工具。在本文中,我们使用pm4py库在Python中全面分析了道路罚款管理过程。我们首先导入了一个事件日志数据集,并探讨了其特征,包括活动分布和过程变体。通过筛选和统计分析,我们揭示了过程执行中的关键模式和变化。接着,我们应用了各种过程挖掘算法,包括Alpha矿工、归纳矿工和启发式矿工,从事件日志数据中发现了过程模型。我们绘制了发现的过程模型,以了解工作流程结构和过程之间的依赖关系。此外,我们讨论了每种挖掘方法在捕捉底层过程动态方面的优缺点。我们的研究结果阐明了道路罚款管理流程的效率和效果,为过程优化和决策提供了宝贵的洞见。本研究展示了pm4py在促进过程挖掘任务中的实用性,以及其分析真实世界业务流程的潜力。
https://arxiv.org/abs/2409.11294
The Vehicle Routing Problem is about optimizing the routes of vehicles to meet the needs of customers at specific locations. The route graph consists of depots on several levels and customer positions. Several optimization methods have been developed over the years, most of which are based on some type of classic heuristic: genetic algorithm, simulated annealing, tabu search, ant colony optimization, firefly algorithm. Recent developments in machine learning provide a new toolset, the rich family of neural networks, for tackling complex problems. The main area of application of neural networks is the area of classification and regression. Route optimization can be viewed as a new challenge for neural networks. The article first presents an analysis of the applicability of neural network tools, then a novel graphical neural network model is presented in detail. The efficiency analysis based on test experiments shows the applicability of the proposed NN architecture.
车辆路由问题是在特定地点优化车辆的路线以满足客户需求。路线图由多个层次的车站和客户位置组成。多年来,已经开发了许多优化方法,大多数都是基于某种经典的启发式:遗传算法、模拟退火、 tabu搜索、蚁群优化、火萤算法。机器学习领域的最近发展提供了一个新的工具库,即丰富多样的神经网络,用于解决复杂的问题。神经网络的主要应用领域是分类和回归。将路线优化视为神经网络的一个新挑战。本文首先对神经网络工具的适用性进行了分析,然后详细介绍了一种新颖的图形神经网络模型。基于测试实验的效率分析表明,所提出的NN架构的适用性。
https://arxiv.org/abs/2409.11290
This work proposes an approach that integrates reinforcement learning and model predictive control (MPC) to efficiently solve finite-horizon optimal control problems in mixed-logical dynamical systems. Optimization-based control of such systems with discrete and continuous decision variables entails the online solution of mixed-integer quadratic or linear programs, which suffer from the curse of dimensionality. Our approach aims at mitigating this issue by effectively decoupling the decision on the discrete variables and the decision on the continuous variables. Moreover, to mitigate the combinatorial growth in the number of possible actions due to the prediction horizon, we conceive the definition of decoupled Q-functions to make the learning problem more tractable. The use of reinforcement learning reduces the online optimization problem of the MPC controller from a mixed-integer linear (quadratic) program to a linear (quadratic) program, greatly reducing the computational time. Simulation experiments for a microgrid, based on real-world data, demonstrate that the proposed method significantly reduces the online computation time of the MPC approach and that it generates policies with small optimality gaps and high feasibility rates.
本文提出了一种将强化学习和模型预测控制(MPC)相结合的方法,以高效解决混合逻辑动态系统中有限时间最优控制问题。基于优化的控制这些系统具有离散和连续决策变量,会导致混合整数二次或线性规划的在线求解,而该问题受到维数诅咒的影响。我们的方法旨在通过有效隔离离散变量和连续变量的决策来减轻这一问题。此外,为了减轻由于预测视野中可能行动数量的组合增长,我们定义了分离的Q函数来使学习问题更加简单。基于强化学习的控制将MPC控制器的在线优化问题从混合整数线性(二次)程序减少到线性(二次)程序,大大减少了计算时间。基于实际世界数据的微电网仿真实验证明,与MPC方法相比,所提出的方法显著减少了在线计算时间,并生成了具有小最优性缺口和高可行性的策略。
https://arxiv.org/abs/2409.11267