The development of large vision-language models, notably CLIP, has catalyzed research into effective adaptation techniques, with a particular focus on soft prompt tuning. Conjointly, test-time augmentation, which utilizes multiple augmented views of a single image to enhance zero-shot generalization, is emerging as a significant area of interest. This has predominantly directed research efforts toward test-time prompt tuning. In contrast, we introduce a robust MeanShift for Test-time Augmentation (MTA), which surpasses prompt-based methods without requiring this intensive training procedure. This positions MTA as an ideal solution for both standalone and API-based applications. Additionally, our method does not rely on ad hoc rules (e.g., confidence threshold) used in some previous test-time augmentation techniques to filter the augmented views. Instead, MTA incorporates a quality assessment variable for each view directly into its optimization process, termed as the inlierness score. This score is jointly optimized with a density mode seeking process, leading to an efficient training- and hyperparameter-free approach. We extensively benchmark our method on 15 datasets and demonstrate MTA's superiority and computational efficiency. Deployed easily as plug-and-play module on top of zero-shot models and state-of-the-art few-shot methods, MTA shows systematic and consistent improvements.
大视觉语言模型的开发,特别是CLIP,已经推动了有效适应技术的研究,特别是对软提示进行优化。同时,测试时间增强,利用单张图像的多个增强视图来提高零样本通用性,正在成为一个有趣的领域。这一方向主要将研究精力集中在测试时间提示调整上。相比之下,我们引入了一个稳健的MeanShift for Test-time Augmentation(MTA),它超过了需要这种密集训练过程的基于提示的方法。这使得MTA成为适用于离线和API基础应用的理想解决方案。此外,我们的方法不依赖于某些以前测试时间增强技术中使用的临界值(例如置信度阈值)来过滤增强视图。相反,MTA将每个视图的直接质量评估量融入优化过程,称为异常得分。这个分数与密度模式寻求过程共同优化,导致了一种高效的学习- 和超参数- 免费的方法。我们在15个数据集上对方法进行了广泛的基准,证明了MTA的优越性和计算效率。部署容易地作为零样本模型和最先进的少量样本方法的插件,MTA显示出系统性和一致性的改进。
https://arxiv.org/abs/2405.02266
We present X-SLAM, a real-time dense differentiable SLAM system that leverages the complex-step finite difference (CSFD) method for efficient calculation of numerical derivatives, bypassing the need for a large-scale computational graph. The key to our approach is treating the SLAM process as a differentiable function, enabling the calculation of the derivatives of important SLAM parameters through Taylor series expansion within the complex domain. Our system allows for the real-time calculation of not just the gradient, but also higher-order differentiation. This facilitates the use of high-order optimizers to achieve better accuracy and faster convergence. Building on X-SLAM, we implemented end-to-end optimization frameworks for two important tasks: camera relocalization in wide outdoor scenes and active robotic scanning in complex indoor environments. Comprehensive evaluations on public benchmarks and intricate real scenes underscore the improvements in the accuracy of camera relocalization and the efficiency of robotic navigation achieved through our task-aware optimization. The code and data are available at this https URL.
我们提出了X-SLAM,一种利用复杂步长有限差分(CSFD)方法实现实时高密度的SLAM系统,无需大型计算图。我们方法的关键是将SLAM过程视为一个可导函数,从而在复数域内通过泰勒级数展开计算重要SLAM参数的导数。我们的系统允许实时计算不仅仅是梯度,还包括高阶导数。这有助于使用高阶优化器实现更好的精度和更快的收敛。在X-SLAM的基础上,我们为两个重要任务实现了端到端优化框架:在广阔户外场景中的相机重新定位和复杂室内环境中主动机器人扫描。在公开基准测试和复杂现实场景的全面评估中,我们的任务感知优化提高了相机重新定位的精度,以及通过我们的优化实现了机器人导航的高效性。代码和数据可在此https URL获取。
https://arxiv.org/abs/2405.02187
We consider a set of challenging sequential manipulation puzzles, where an agent has to interact with multiple movable objects and navigate narrow passages. Such settings are notoriously difficult for Task-and-Motion Planners, as they require interdependent regrasps and solving hard motion planning problems. In this paper, we propose to search over sequences of easier pick-and-place subproblems, which can lead to the solution of the manipulation puzzle. Our method combines a heuristic-driven forward search of subproblems with an optimization-based Task-and-Motion Planning solver. To guide the search, we introduce heuristics to generate and prioritize useful subgoals. We evaluate our approach on various manually designed and automatically generated scenes, demonstrating the benefits of auxiliary subproblems in sequential manipulation planning.
我们考虑一组具有挑战性的序列操作谜题,其中智能体需要与多个可移动的对象进行交互并穿越狭窄的通道。对于任务和动作规划器来说,这样的设置通常是困难的,因为它们需要相互依存的规则和解决困难的动作规划问题。在本文中,我们提出了一个搜索更容易的挑选和放置子问题的序列的方法,这些问题可以导致操作谜题的解决。我们的方法结合了以启发式驱动的前向搜索子问题以及基于优化的任务和动作规划器的优化方法。为了引导搜索,我们引入了启发式来生成和优先考虑有用的子目标。我们在各种手动设计和自动生成的场景中评估了我们的方法,证明了辅助子问题在序列操作规划中的优势。
https://arxiv.org/abs/2405.02053
Recent advancements have showcased the potential of handheld millimeter-wave (mmWave) imaging, which applies synthetic aperture radar (SAR) principles in portable settings. However, existing studies addressing handheld motion errors either rely on costly tracking devices or employ simplified imaging models, leading to impractical deployment or limited performance. In this paper, we present IFNet, a novel deep unfolding network that combines the strengths of signal processing models and deep neural networks to achieve robust imaging and focusing for handheld mmWave systems. We first formulate the handheld imaging model by integrating multiple priors about mmWave images and handheld phase errors. Furthermore, we transform the optimization processes into an iterative network structure for improved and efficient imaging performance. Extensive experiments demonstrate that IFNet effectively compensates for handheld phase errors and recovers high-fidelity images from severely distorted signals. In comparison with existing methods, IFNet can achieve at least 11.89 dB improvement in average peak signal-to-noise ratio (PSNR) and 64.91% improvement in average structural similarity index measure (SSIM) on a real-world dataset.
近年来,便携式毫米波成像(mmWave Imaging)的潜在应用已经得到了展示,这种应用利用了便携式设置下的合成孔径雷达(SAR)原理。然而,现有的研究要么依赖于昂贵的跟踪设备,要么采用简化的成像模型,导致实际部署不实用或性能有限。在本文中,我们提出了IFNet,一种新颖的深度展开网络,结合了信号处理模型的优势和深度神经网络的优点,为手持mmWave系统实现稳健的成像和聚焦。我们首先通过整合多个关于mmWave图像和手持相位误差的多项prior,形式化地定义了手持成像模型。此外,我们将优化过程转化为一个迭代网络结构,以提高和实现高效的成像性能。大量实验证明IFNet能够有效补偿手持相位误差,并从严重扭曲的信号中恢复高保真的图像。与现有方法相比,IFNet可以在真实世界数据集上实现至少11.89 dB的平均峰值信号-噪声比(PSNR)的改进和64.91%的平均结构相似性指数测量(SSIM)。
https://arxiv.org/abs/2405.02023
Traditional mathematical programming solvers require long computational times to solve constrained minimization problems of complex and large-scale physical systems. Therefore, these problems are often transformed into unconstrained ones, and solved with computationally efficient optimization approaches based on first-order information, such as the gradient descent method. However, for unconstrained problems, balancing the minimization of the objective function with the reduction of constraint violations is challenging. We consider the class of time-dependent minimization problems with increasing (possibly) nonlinear and non-convex objective function and non-decreasing (possibly) nonlinear and non-convex inequality constraints. To efficiently solve them, we propose a penalty-based guardrail algorithm (PGA). This algorithm adapts a standard penalty-based method by dynamically updating the right-hand side of the constraints with a guardrail variable which adds a margin to prevent violations. We evaluate PGA on two novel application domains: a simplified model of a district heating system and an optimization model derived from learned deep neural networks. Our method significantly outperforms mathematical programming solvers and the standard penalty-based method, and achieves better performance and faster convergence than a state-of-the-art algorithm (IPDD) within a specified time limit.
传统数学编程求解器需要花费长的时间来求解复杂和大规模物理系统的约束最小化问题。因此,通常将这些问题转化为无约束问题,并使用计算效率高的优化方法(如梯度下降法)来求解。然而,对于无约束问题,平衡最小化目标函数与减少约束违反之间的平衡具有挑战性。我们考虑具有增加(可能)非线性和非凸目标函数以及减少(可能)非线性和非凸不等式约束的时间依赖最小化问题。为了有效地解决这些问题,我们提出了一个基于惩罚的守护算法(PGA)。该算法通过动态更新约束的右端随约束惩罚变量的增加,将标准的惩罚方法进行了适应。我们在两个新的应用领域上评估了PGA:一个简化的区域供暖系统的模型和一个学习深度神经网络得到的优化模型。我们的方法显著超过了数学编程求解器和标准惩罚方法,并且在指定的时间限制内实现了更好的性能和更快的收敛速度。
https://arxiv.org/abs/2405.01984
We present a baseline for the SemEval 2024 task 2 challenge, whose objective is to ascertain the inference relationship between pairs of clinical trial report sections and statements. We apply prompt optimization techniques with LLM Instruct models provided as a Language Model-as-a-Service (LMaaS). We observed, in line with recent findings, that synthetic CoT prompts significantly enhance manually crafted ones.
我们为SemEval 2024任务2挑战提供了 baseline,其目标是确定临床研究报告中段落之间的推理关系。我们使用LLM Instruct模型作为语言模型服务(LMaaS)应用提示优化技术。我们观察到,与最近的研究结果一致,人造提示显著增强了手工制作的提示。
https://arxiv.org/abs/2405.01942
Modular reconfigurable manipulators enable quick adaptation and versatility to address different application environments and tailor to the specific requirements of the tasks. Task performance significantly depends on the manipulator's mounted pose and morphology design, therefore posing the need of methodologies for selecting suitable modular robot configurations and mounted pose that can address the specific task requirements and required performance. Morphological changes in modular robots can be derived through a discrete optimization process involving the selective addition or removal of modules. In contrast, the adjustment of the mounted pose operates within a continuous space, allowing for smooth and precise alterations in both orientation and position. This work introduces a computational framework that simultaneously optimizes modular manipulators' mounted pose and morphology. The core of the work is that we design a mapping function that \textit{implicitly} captures the morphological state of manipulators in the continuous space. This transformation function unifies the optimization of mounted pose and morphology within a continuous space. Furthermore, our optimization framework incorporates a array of performance metrics, such as minimum joint effort and maximum manipulability, and considerations for trajectory execution error and physical and safety constraints. To highlight our method's benefits, we compare it with previous methods that framed such problem as a combinatorial optimization problem and demonstrate its practicality in selecting the modular robot configuration for executing a drilling task with the CONCERT modular robotic platform.
模块可重构操纵器允许快速适应和多样性以应对不同的应用环境,并专门满足任务的特定要求。任务性能很大程度上取决于操纵器安装的姿态和形态设计,因此需要方法来选择合适的模块化机器人配置和安装姿势来满足特定任务要求和性能需求。通过离散优化过程,可以获得模块化机器人的形态变化。相反,安装姿势的调整在连续空间中进行,允许在方向和位置上进行平滑和精确的修改。本工作介绍了一个计算框架,同时优化模块化操纵器的安装姿势和形态。工作的核心是我们设计了一个映射函数,隐含地捕捉了连续空间中操纵器的形态状态。这个变换函数将安装姿势和形态的优化在连续空间中统一起来。此外,我们的优化框架包括一系列性能度量,如最小关节努力和最大可操作性,以及轨迹执行误差和物理和安全性考虑。为了突出我们方法的优点,我们将它与之前的方法进行了比较,这些方法将类似问题视为组合优化问题,并展示了其在选择使用CONCERT模块化机器人平台执行钻井任务时的实用性。
https://arxiv.org/abs/2405.01923
The neural combinatorial optimization (NCO) approach has shown great potential for solving routing problems without the requirement of expert knowledge. However, existing constructive NCO methods cannot directly solve large-scale instances, which significantly limits their application prospects. To address these crucial shortcomings, this work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural combinatorial optimization. In particular, we design a powerful yet lightweight instance-conditioned adaptation module for the NCO model to generate better solutions for instances across different scales. In addition, we develop an efficient three-stage reinforcement learning-based training scheme that enables the model to learn cross-scale features without any labeled optimal solution. Experimental results show that our proposed method is capable of obtaining excellent results with a very fast inference time in solving Traveling Salesman Problems (TSPs) and Capacitated Vehicle Routing Problems (CVRPs) across different scales. To the best of our knowledge, our model achieves state-of-the-art performance among all RL-based constructive methods for TSP and CVRP with up to 1,000 nodes.
神经组合优化(NCO)方法在解决无需专家知识的路由问题方面具有巨大的潜力。然而,现有的构建性NCO方法无法直接解决大规模实例,这严重限制了它们的应用前景。为了应对这些关键不足,本文提出了一个新的事例条件适应模型(ICAM)来更好地处理大规模神经组合优化。 特别是,我们为NCO模型设计了一个强大而轻量级的实例条件适应模块,以生成更好的实例解。此外,我们开发了一个高效的基于强化学习的三阶段训练计划,使模型能够在没有任何有标签最优解的情况下学习跨规模特征。 实验结果表明,与不同规模下的旅行商问题(TSPs)和容量车辆路由问题(CVRPs)相比,我们提出的方法具有极快的推理速度。据我们所知,我们的模型在所有基于RL的构建性方法中取得了最先进的性能,节点数量达到1000个以上。
https://arxiv.org/abs/2405.01906
As the capabilities of Large Language Models (LLMs) in healthcare and medicine continue to advance, there is a growing need for competitive open-source models that can safeguard public interest. With the increasing availability of highly competitive open base models, the impact of continued pre-training is increasingly uncertain. In this work, we explore the role of instruct tuning, model merging, alignment, red teaming and advanced inference schemes, as means to improve current open models. To that end, we introduce the Aloe family, a set of open medical LLMs highly competitive within its scale range. Aloe models are trained on the current best base models (Mistral, LLaMA 3), using a new custom dataset which combines public data sources improved with synthetic Chain of Thought (CoT). Aloe models undergo an alignment phase, becoming one of the first few policy-aligned open healthcare LLM using Direct Preference Optimization, setting a new standard for ethical performance in healthcare LLMs. Model evaluation expands to include various bias and toxicity datasets, a dedicated red teaming effort, and a much-needed risk assessment for healthcare LLMs. Finally, to explore the limits of current LLMs in inference, we study several advanced prompt engineering strategies to boost performance across benchmarks, yielding state-of-the-art results for open healthcare 7B LLMs, unprecedented at this scale.
随着大型语言模型(LLMs)在医疗和医学领域的功能不断扩展,保护公共利益的需求不断增加。随着高度竞争性的开源模型的可用性增加,继续进行预训练的影响越来越不确定。在这项工作中,我们探讨了指令调整、模型合并、对齐、红队和高级推理方案等方法,作为改进现有开源模型的手段。为此,我们引入了Aloe家族,一系列在规模范围内高度竞争的开放医疗LLM。Aloe模型在当前最佳基础模型(Mistral,LLLaM3)上进行训练,使用一种新的人工合成连续性思维(CoT)的数据集。Aloe模型经历了一个对齐阶段,成为第一个使用直接偏好优化策略实现政策对齐的开放医疗LLM,为医疗LLM的道德表现树立了新的标准。模型评估范围扩展到包括各种偏见和毒性数据集、专门的红军行动和医疗LLM所需的严重风险评估。最后,为了研究现有LLM在推理方面的局限性,我们研究了几种高级提示工程策略,以提高基准测试的性能,为开放医疗7B LLM产生最先进的结果,空前规模。
https://arxiv.org/abs/2405.01886
Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter challenges in handling sparse point clouds, achieving real-time continuous classification, and coping with limited monitoring ranges when statically mounted. To overcome these limitations, we propose RobHAR, a movable robot-mounted mmWave radar system with lightweight deep neural networks for real-time monitoring of human activities. Specifically, we first propose a sparse point cloud-based global embedding to learn the features of point clouds using the light-PointNet (LPN) backbone. Then, we learn the temporal pattern with a bidirectional lightweight LSTM model (BiLiLSTM). In addition, we implement a transition optimization strategy, integrating the Hidden Markov Model (HMM) with Connectionist Temporal Classification (CTC) to improve the accuracy and robustness of the continuous HAR. Our experiments on three datasets indicate that our method significantly outperforms the previous studies in both discrete and continuous HAR tasks. Finally, we deploy our system on a movable robot-mounted edge computing platform, achieving flexible healthcare monitoring in real-world scenarios.
医疗监测对于单独生活的老年人来说至关重要。它可以检测到像跌倒这样的危险情况,并为拯救生命提供及时的警报。基于先进的人活动识别(HAR)模型的非侵入性毫米波(mmWave)医疗监测系统近年来引起了广泛关注。然而,它们在处理稀疏点云、实现实时连续分类和应对有限监测范围时遇到了挑战。为了克服这些限制,我们提出了RobHAR,一种可移动的机器人搭载的mmWave雷达系统,用于实时监测人类活动。具体来说,我们首先提出了基于稀疏点云的全局嵌入来学习点云的特征,使用光点网络(LPN)骨干网络。然后,我们使用双向轻量级LSTM模型学习时间模式。此外,我们还实现了一个转换优化策略,将隐马尔可夫模型(HMM)与连接式 Temporal Classification(CTC)结合,以提高连续 HAR的准确性和鲁棒性。我们对三个数据集的实验结果表明,我们的方法在离散和连续 HAR任务中显著超过了之前的研究。最后,我们将系统部署在可移动机器人搭载的边缘计算平台上,实现了在现实场景中灵活的医疗监测。
https://arxiv.org/abs/2405.01882
There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization.
越来越多的需求将计算密集型深度学习(DL)模型在资源受限的移动设备上部署,以实现实时智能应用。配备各种处理单元(如CPU、GPU和NPU),移动设备通过异构处理器并行执行交并计算来加速DL推理。已经探索了许多有效的并行方法来优化计算分布,实现负载均衡和最小化处理器之间的通信成本。然而,在动态和多样化的真实移动环境中,它们在平行DL推理方面的实际效果还有待进一步研究。本文进行了一项全面的实证研究,以评估异构移动处理器上并行DL推理的能力和挑战。通过精心设计的实验,覆盖各种DL模型、移动软件/硬件环境、工作负载模式和资源可用性,我们找出了现有技术的局限性,并强调了跨层优化的机会。
https://arxiv.org/abs/2405.01851
This paper studies algorithmic decision-making in the presence of strategic individual behaviors, where an ML model is used to make decisions about human agents and the latter can adapt their behavior strategically to improve their future data. Existing results on strategic learning have largely focused on the linear setting where agents with linear labeling functions best respond to a (noisy) linear decision policy. Instead, this work focuses on general non-linear settings where agents respond to the decision policy with only "local information" of the policy. Moreover, we simultaneously consider the objectives of maximizing decision-maker welfare (model prediction accuracy), social welfare (agent improvement caused by strategic behaviors), and agent welfare (the extent that ML underestimates the agents). We first generalize the agent best response model in previous works to the non-linear setting, then reveal the compatibility of welfare objectives. We show the three welfare can attain the optimum simultaneously only under restrictive conditions which are challenging to achieve in non-linear settings. The theoretical results imply that existing works solely maximizing the welfare of a subset of parties inevitably diminish the welfare of the others. We thus claim the necessity of balancing the welfare of each party in non-linear settings and propose an irreducible optimization algorithm suitable for general strategic learning. Experiments on synthetic and real data validate the proposed algorithm.
本文研究了在战略个体行为存在的情况下,使用机器学习模型进行人代理决策的问题,其中后一个可以战略性地调整其行为以提高其未来的数据。现有结果大部分集中在线性设置下,具有线性标记函数的代理商对(噪声)线性决策策略的最好反应。相反,本文关注的是通用非线性设置,在这种设置下,代理商仅对策略的局部信息做出反应。此外,我们还同时考虑了最大化决策者福利(模型预测准确性)、社会福利(由战略行为导致的代理商改进)和代理商福利(ML低估代理商的程度)。我们首先将之前工作的代理商最佳响应模型在非线性设置中进行一般化,然后揭示了福利目标的可行性。我们证明了只有在不确定条件下,三个福利才能达到最优解,这在非线性设置中是很难实现的。理论结果表明,仅最大化部分参与方福利的工作会不可避免地削弱其他参与方的福利。因此,我们声称在非线性设置中平衡每个参与方的福利是必要的,并提出了一个适合一般战略学习的不还原优化算法。对于合成和真实数据的实验验证了所提出的算法。
https://arxiv.org/abs/2405.01810
Robotic applications across industries demand advanced navigation for safe and smooth movement. Smooth path planning is crucial for mobile robots to ensure stable and efficient navigation, as it minimizes jerky movements and enhances overall performance Achieving this requires smooth collision-free paths. Partial Swarm Optimization (PSO) and Potential Field (PF) are notable path-planning techniques, however, they may struggle to produce smooth paths due to their inherent algorithms, potentially leading to suboptimal robot motion and increased energy consumption. In addition, while PSO efficiently explores solution spaces, it generates long paths and has limited global search. On the contrary, PF methods offer concise paths but struggle with distant targets or obstacles. To address this, we propose Smoothed Partial Swarm Optimization with Improved Potential Field (SPSO-IPF), combining both approaches and it is capable of generating a smooth and safe path. Our research demonstrates SPSO-IPF's superiority, proving its effectiveness in static and dynamic environments compared to a mere PSO or a mere PF approach.
机器人应用在多个行业中需要先进的导航来实现安全和平稳的运动。平滑路径规划对于移动机器人来说至关重要,因为它可以最小化剧烈运动并提高整体性能。要实现这一点,需要平滑的冲突free路径。部分聚类优化(PSO)和势场(PF)是著名的路径规划技术,然而,由于其固有算法,它们可能无法产生平滑的路径,从而导致机器人运动 suboptimal 和能源消耗增加。此外,尽管PSO有效地探索解决方案空间,但它生成长路径,全局搜索有限。相反,PF方法提供简洁的路径,但与远距离目标或障碍物 struggle。为了应对这个问题,我们提出了平滑部分聚类优化和改进势场(SPSO-IPF)的方法,结合两种方法,它能够生成平滑和安全路径。我们的研究证明了SPSO-IPF的优越性,证明了与仅仅使用PSO或仅仅使用PF方法相比,其在静态和动态环境中的有效性。
https://arxiv.org/abs/2405.01794
The increased deployment of multi-robot systems (MRS) in various fields has led to the need for analysis of system-level performance. However, creating consistent metrics for MRS is challenging due to the wide range of system and environmental factors, such as team size and environment size. This paper presents a new analytical framework for MRS based on dimensionless variable analysis, a mathematical technique typically used to simplify complex physical systems. This approach effectively condenses the complex parameters influencing MRS performance into a manageable set of dimensionless variables. We form dimensionless variables which encapsulate key parameters of the robot team and task. Then we use these dimensionless variables to fit a parametric model of team performance. Our model successfully identifies critical performance determinants and their interdependencies, providing insight for MRS design and optimization. The application of dimensionless variable analysis to MRS offers a promising method for MRS analysis that effectively reduces complexity, enhances comprehension of system behaviors, and informs the design and management of future MRS deployments.
多机器人系统(MRS)在各种领域的广泛应用导致了系统级性能分析的需求。然而,为MRS创建一致的度量标准具有挑战性,由于涉及系统大小和环境大小的广泛范围因素。本文基于无度变量分析(维度无关变量分析)提出了一种新的MRS分析框架,这是一种通常用于简化复杂物理系统的数学技术。这种方法有效地将影响MRS性能的复杂参数压缩成可管理的一组无度变量。我们创建了包含机器人团队和任务关键参数的无度变量。然后,我们使用这些无度变量来拟合一个参数模型,该模型成功识别了关键绩效决定因素及其相互依赖关系,为MRS设计和优化提供了洞察。将维度无关变量分析应用于MRS提供了一种有前途的MRS分析方法,有效减少了复杂性,增强了系统行为的理解,并为未来MRS部署的设计和管理提供了指导。
https://arxiv.org/abs/2405.01771
Traditional optimization-based planners, while effective, suffer from high computational costs, resulting in slow trajectory generation. A successful strategy to reduce computation time involves using Imitation Learning (IL) to develop fast neural network (NN) policies from those planners, which are treated as expert demonstrators. Although the resulting NN policies are effective at quickly generating trajectories similar to those from the expert, (1) their output does not explicitly account for dynamic feasibility, and (2) the policies do not accommodate changes in the constraints different from those used during training. To overcome these limitations, we propose Constraint-Guided Diffusion (CGD), a novel IL-based approach to trajectory planning. CGD leverages a hybrid learning/online optimization scheme that combines diffusion policies with a surrogate efficient optimization problem, enabling the generation of collision-free, dynamically feasible trajectories. The key ideas of CGD include dividing the original challenging optimization problem solved by the expert into two more manageable sub-problems: (a) efficiently finding collision-free paths, and (b) determining a dynamically-feasible time-parametrization for those paths to obtain a trajectory. Compared to conventional neural network architectures, we demonstrate through numerical evaluations significant improvements in performance and dynamic feasibility under scenarios with new constraints never encountered during training.
传统的优化规划器虽然在效果上很有效,但计算成本很高,导致轨迹生成速度较慢。成功减少计算时间的方法之一是使用模仿学习(IL)从这些规划器中开发快速神经网络(NN)策略,将它们视为专家演示者。尽管生成的NN策略在快速生成类似于专家轨迹方面非常有效,但(1)它们的输出没有明确考虑到动态可行性,(2)这些策略没有考虑到训练过程中约束的变化。为了克服这些限制,我们提出了约束引导扩散(CGD),一种新型的IL-基轨迹规划方法。CGD利用了一种结合扩散策略和代理高效优化问题的混合学习/在线优化方案,使得可以生成无碰撞、动态可行轨迹。CGD的关键思想包括将专家通过 IL 解决的原始具有挑战性的优化问题划分为两个更容易管理子问题:(a)高效地找到无碰撞路径,(b)为这些路径确定一个动态可行的时间参数,以获得轨迹。与传统的神经网络架构相比,我们通过数值评估展示了在训练过程中从未遇到过的新的约束条件下,性能和动态可行性都有显著的提高。
https://arxiv.org/abs/2405.01758
Deep learning has made significant progress in computer vision, specifically in image classification, object detection, and semantic segmentation. The skip connection has played an essential role in the architecture of deep neural networks,enabling easier optimization through residual learning during the training stage and improving accuracy during testing. Many neural networks have inherited the idea of residual learning with skip connections for various tasks, and it has been the standard choice for designing neural networks. This survey provides a comprehensive summary and outlook on the development of skip connections in deep neural networks. The short history of skip connections is outlined, and the development of residual learning in deep neural networks is surveyed. The effectiveness of skip connections in the training and testing stages is summarized, and future directions for using skip connections in residual learning are discussed. Finally, we summarize seminal papers, source code, models, and datasets that utilize skip connections in computer vision, including image classification, object detection, semantic segmentation, and image reconstruction. We hope this survey could inspire peer researchers in the community to develop further skip connections in various forms and tasks and the theory of residual learning in deep neural networks. The project page can be found at this https URL
深度学习在计算机视觉领域取得了显著进展,尤其是在图像分类、目标检测和语义分割方面。跳转连接在深度神经网络的架构中发挥了关键作用,通过在训练阶段通过残差学习进行更简单的优化,并在测试阶段提高准确性。许多神经网络都继承了残差学习与跳转连接的想法,将其作为设计神经网络的标准选择。 本次调查对跳转连接在深度神经网络中的发展进行了全面的概括和展望。首先简要介绍了跳转连接的短史,然后调查了在深度神经网络中残差学习的开发。总结了跳转连接在训练和测试阶段的有效性,并讨论了在残差学习中将跳转连接用于未来研究的方向。最后,我们总结了在计算机视觉领域使用跳转连接的一些论文、源代码、模型和数据集。我们希望能激励社区中的同行研究者在各种形式和任务上进一步发展跳转连接,并探讨深度神经网络中残差学习的理论。项目页面可以通过这个链接找到:https://github.com/your_username/project_name
https://arxiv.org/abs/2405.01725
Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications. Nevertheless, managing the trade-off between reward and safety during exploration presents a significant challenge. Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the theory of gradient manipulation. Initially, we analyze the conflict between reward and safety gradients. Subsequently, we tackle the balance between reward and safety optimization by proposing a soft switching policy optimization method, for which we provide convergence analysis. Based on our theoretical examination, we provide a safe RL framework to overcome the aforementioned challenge, and we develop a Safety-MuJoCo Benchmark to assess the performance of safe RL algorithms. Finally, we evaluate the effectiveness of our method on the Safety-MuJoCo Benchmark and a popular safe benchmark, Omnisafe. Experimental results demonstrate that our algorithms outperform several state-of-the-art baselines in terms of balancing reward and safety optimization.
确保强化学习(RL)的安全性对于其在现实应用中的部署至关重要。然而,在探索过程中管理奖励和安全之间的权衡是一个具有挑战性的问题。通过调整策略来提高奖励性能可能会对安全性造成不利影响。在这项研究中,我们旨在通过利用梯度操纵理论来解决这种矛盾关系。首先,我们分析了奖励和安全梯度之间的冲突。接着,我们通过提出软切换策略优化方法来解决奖励和安全优化之间的平衡,并为该方法提供了收敛分析。根据我们的理论审查,我们提供了一个安全的RL框架来克服前述挑战,并开发了一个Safety-MuJoCo基准来评估安全RL算法的性能。最后,我们在Safety-MuJoCo基准和流行的安全基准Omnisafe上评估了我们方法的有效性。实验结果表明,我们的算法在平衡奖励和安全优化方面优于多个最先进的基线。
https://arxiv.org/abs/2405.01677
Art reinterpretation is the practice of creating a variation of a reference work, making a paired artwork that exhibits a distinct artistic style. We ask if such an image pair can be used to customize a generative model to capture the demonstrated stylistic difference. We propose Pair Customization, a new customization method that learns stylistic difference from a single image pair and then applies the acquired style to the generation process. Unlike existing methods that learn to mimic a single concept from a collection of images, our method captures the stylistic difference between paired images. This allows us to apply a stylistic change without overfitting to the specific image content in the examples. To address this new task, we employ a joint optimization method that explicitly separates the style and content into distinct LoRA weight spaces. We optimize these style and content weights to reproduce the style and content images while encouraging their orthogonality. During inference, we modify the diffusion process via a new style guidance based on our learned weights. Both qualitative and quantitative experiments show that our method can effectively learn style while avoiding overfitting to image content, highlighting the potential of modeling such stylistic differences from a single image pair.
艺术再诠释是对参考作品的一种变体,创作了一对对比艺术作品,展示了独特的艺术风格。我们询问,这样的图像对是否可以用于定制生成模型来捕捉演示的文体差异。我们提出了Pair Customization,一种新的定制方法,它从单个图像对中学习文体差异,然后将获得的风格应用于生成过程。与现有的方法不同,我们的方法从单个图像对中捕获文体差异。这使我们能够在不需要对实例的具体内容进行过拟合的情况下应用文体变化。为了应对这项新任务,我们采用了一种联合优化方法,将风格和内容明确地分离到两个LORA权重空间中。我们优化了这些风格和内容权重,以复制风格和内容图像,同时鼓励它们的正交性。在推理过程中,我们通过基于我们学习到的权重的新的样式指导来修改扩散过程。所有定性和定量实验都表明,我们的方法可以有效地学习风格,同时避免对图像内容的过拟合,突出了从单个图像对中建模这些文体差异的潜在可能性。
https://arxiv.org/abs/2405.01536
Alignment is a standard procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e. hallucination). In this paper, we study how to make the LLM alignment process more factual, by first identifying factors that lead to hallucination in both alignment steps:\ supervised fine-tuning (SFT) and reinforcement learning (RL). In particular, we find that training the LLM on new knowledge or unfamiliar texts can encourage hallucination. This makes SFT less factual as it trains on human labeled data that may be novel to the LLM. Furthermore, reward functions used in standard RL can also encourage hallucination, because it guides the LLM to provide more helpful responses on a diverse set of instructions, often preferring longer and more detailed responses. Based on these observations, we propose factuality-aware alignment, comprised of factuality-aware SFT and factuality-aware RL through direct preference optimization. Experiments show that our proposed factuality-aware alignment guides LLMs to output more factual responses while maintaining instruction-following capability.
对齐是一种对预训练的大型语言模型(LLMs)进行微调的标准程序,以遵循自然语言指令并作为有帮助的AI助手。然而,我们观察到,传统的对齐过程无法增强LLMs的事实准确性,并通常导致生成更多的虚假事实(即幻觉)。在本文中,我们研究了如何使LLM的对齐过程更加事实准确,通过首先确定导致对齐步骤中出现幻觉的因素:有监督的微调(SFT)和强化学习(RL)。 特别是,我们发现,在为LLM提供新知识或熟悉文本进行训练时,可能会鼓励幻觉。这使得SFT变得不准确,因为它在训练时使用的人类标注数据可能对LLM来说是新颖的。此外,标准RL中使用的奖励函数也可能鼓励幻觉,因为它引导LLM为多样性的指令提供更有帮助的回答,往往更喜欢更长的、更详细的回答。 基于这些观察结果,我们提出了具有事实意识的对齐方法,通过直接偏好优化实现事实意识SFT和事实意识RL。实验证明,我们提出的事实意识对齐引导LLMs输出更准确的事实性响应,同时保持指令跟踪能力。
https://arxiv.org/abs/2405.01525
Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO. Although DPO has rapidly gained popularity due to its straightforward training process and competitive results, there is an open question of whether there remain practical advantages of using a discriminator, like a reward model, to evaluate responses. We propose D2PO, discriminator-guided DPO, an approach for the online setting where preferences are being collected throughout learning. As we collect gold preferences, we use these not only to train our policy, but to train a discriminative response evaluation model to silver-label even more synthetic data for policy training. We explore this approach across a set of diverse tasks, including a realistic chat setting, we find that our approach leads to higher-quality outputs compared to DPO with the same data budget, and greater efficiency in terms of preference data requirements. Furthermore, we show conditions under which silver labeling is most helpful: it is most effective when training the policy with DPO, outperforming traditional PPO, and benefits from maintaining a separate discriminator from the policy model.
提出了多种与对齐语言模型的方法,包括监督微调、RLHF和直接优化方法(如DPO)。尽管DPO因其直观的训练过程和具有竞争力的结果而迅速受到欢迎,但使用分类器(如奖励模型)来评估响应仍然是一个有争议的问题。我们提出了D2PO、分类器指导的DPO和在线设置中偏好被收集的方法。随着我们收集金偏好,我们不仅用它们来训练我们的策略,而且用它们来训练一个分类响应评估模型,以对抗训练更多的合成数据。我们在一系列多样任务上探讨了这种方法,包括一个真实的聊天设置,我们发现,与使用相同数据预算相比,我们的方法产生了更高的产品质量,并且在偏好数据要求方面更加高效。此外,我们证明了银标记在什么情况下最有帮助:当使用DPO训练策略时,它最有效;超过传统的PPO,并从策略模型中分离出独立的分类器。
https://arxiv.org/abs/2405.01511