PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, incorporating a wide variety of new features into its platform. To satisfy the growing demand for understanding and utilizing the library and reduce the learning curve of new users, we present the fundamental design principle of the imperative programming interface, and showcase the flexible usage of diverse functionalities and modules using an extremely simple Dubins car example. We also demonstrate that the PyPose can be easily used to navigate a real quadruped robot with a few lines of code.
PyPose是一个开源机器人学习库,它结合了基于学习的方法与基于物理学的优化方法,实现了无缝的机器人全过程中的学习。由于其精心设计的应用编程接口(API)和高效的实现,PyPose在多个任务中被广泛应用。自2022年初首次推出以来,PyPose经历了显著的改进,将其平台包含了一系列丰富的新特性。为了满足不断增长的理解和利用库的需求,并降低新用户的学习曲线,我们提出了 imperative编程接口的基本设计原则,并通过一个简单的Dubins汽车例子展示了各种功能和模块的灵活使用。我们还证明了PyPose可以轻松地用于导航一个真实的四足机器人,只需要几行代码。
https://arxiv.org/abs/2309.13035
Sparse training is one of the promising techniques to reduce the computational cost of DNNs while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only N out of consecutive M elements can be nonzero, has attracted attention due to its hardware-friendly pattern and capability of achieving a high sparse ratio. However, the potential to accelerate N:M sparse DNN training has not been fully exploited, and there is a lack of efficient hardware supporting N:M sparse training. To tackle these challenges, this paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design. At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the computational cost while maintaining model accuracy. At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to neatly support both the regular dense operations and the computation-efficient N:M sparse operations. At the dataflow level, multiple optimization methods ranging from interleave mapping, pre-generation of N:M sparse weights, and offline scheduling, are proposed to boost the computational efficiency of SAT. Finally, the effectiveness of our training scheme is evaluated on a Xilinx VCU1525 FPGA card using various DNN models and datasets. Experimental results show the SAT accelerator with the BDWP sparse training method under 2:8 sparse ratio achieves an average speedup of 1.75x over that with the dense training, accompanied by a negligible accuracy loss of 0.56% on average. Furthermore, our proposed training scheme significantly improves the training throughput by 2.97~25.22x and the energy efficiency by 1.36~3.58x over prior FPGA-based accelerators.
稀疏训练是一种有前途的技术,能够在保留高准确性的同时降低深度学习系统的计算成本。特别是,具有 N:M Fine-grained Structured sparsity 的稀疏结构,其中只有 N 个连续的元素中才有非零值,因此备受关注,因为它具有硬件友好的模式和实现高稀疏比例的能力。然而,加速 N:M 稀疏深度学习训练的潜力尚未得到充分利用,缺乏有效的硬件支持 N:M 稀疏训练。为了解决这些挑战,本文提出了一种计算高效的训练方案,使用算法、结构和数据流的共同设计。在算法层面上,我们提出了一种双向 weight 压缩方法,称为 BDWP,在深度学习训练的forward和backward 过程中利用 N:M 的稀疏权重,可以显著降低计算成本,同时保持模型精度。在架构层面上,我们开发了稀疏深度学习加速器,名为 SAT,以方便支持标准的DenseOps 和计算高效的 N:M 稀疏Ops。在数据流层面上,我们提出了多种优化方法,包括InterleaveMapping、N:M 稀疏权重的预先生成和离线调度,以提高 SAT 的计算效率。最后,我们对我们的训练方案的有效性在 Xilinx VCU1525 FPGA card上进行了评估,使用各种深度学习模型和数据集。实验结果表明,使用 BDWP 稀疏训练方法的 SAT 加速器在 2:8 稀疏比例下实现平均速度提高 1.75 倍,与Dense训练相比,平均精度损失几乎忽略不计。此外,我们提出的训练方案显著提高了训练吞吐量 2.97~25.22 倍,以及与先前基于 FPGA 的加速器相比,能源效率提高了 1.36~3.58 倍。
https://arxiv.org/abs/2309.13015
High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.
高质量的文本嵌入是改善语义文本相似性任务的关键,它们是大型语言模型应用的关键组件。然而,现有文本嵌入模型面临一个共同的挑战,就是梯度消失问题,这主要是因为它们在优化目标中依赖余弦函数,而余弦函数有一个饱和区域。为了解决这一问题,本文提出了一种名为AnglE的新角度优化文本嵌入模型。AnglE的核心思想是引入复杂的空间角度优化。这种新的方法有效地缓解了余弦函数饱和区域产生的不利效应,这些效应可能会阻碍梯度和妨碍优化过程。为了建立全面的语义文本相似性评估,我们实验了现有的短文本语义文本相似性任务数据集和新从GitHub问题集收集的长篇文本语义文本相似性任务数据集。我们还检查了特定领域的有限标记数据下的特定语义文本相似性场景,并探索了AnglE与LLM标记数据的结合方式。广泛的实验涵盖了各种任务,包括短文本语义文本相似性任务、长篇文本语义文本相似性任务和特定领域的语义文本相似性任务。结果表明,AnglE比忽略余弦函数饱和区域的最先进的语义文本相似性模型表现更好。这些发现表明AnglE能够生成高质量的文本嵌入,以及在语义文本相似性任务中的角度优化的有用性。
https://arxiv.org/abs/2309.12871
The rising usage of AI and ML-based processing across application domains has exacerbated the need for low-cost ML implementation, specifically for resource-constrained embedded systems. To this end, approximate computing, an approach that explores the power, performance, area (PPA), and behavioral accuracy (BEHAV) trade-offs, has emerged as a possible solution for implementing embedded machine learning. Due to the predominance of MAC operations in ML, designing platform-specific approximate arithmetic operators forms one of the major research problems in approximate computing. Recently there has been a rising usage of AI/ML-based design space exploration techniques for implementing approximate operators. However, most of these approaches are limited to using ML-based surrogate functions for predicting the PPA and BEHAV impact of a set of related design decisions. While this approach leverages the regression capabilities of ML methods, it does not exploit the more advanced approaches in ML. To this end, we propose AxOCS, a methodology for designing approximate arithmetic operators through ML-based supersampling. Specifically, we present a method to leverage the correlation of PPA and BEHAV metrics across operators of varying bit-widths for generating larger bit-width operators. The proposed approach involves traversing the relatively smaller design space of smaller bit-width operators and employing its associated Design-PPA-BEHAV relationship to generate initial solutions for metaheuristics-based optimization for larger operators. The experimental evaluation of AxOCS for FPGA-optimized approximate operators shows that the proposed approach significantly improves the quality-resulting hypervolume for multi-objective optimization-of 8x8 signed approximate multipliers.
跨应用域的人工智能和机器学习的使用不断增加,这加剧了需要在低成本的机器学习实现方面的需求,特别是针对资源受限的嵌入式系统。因此,近似计算,一种探索权力、性能、区域(PPA)和行为准确性(BEHAV)权衡的方法,已经成为实现嵌入式机器学习的可能解决方案。由于机器学习中的MAC操作主导,设计特定平台的近似算术操作成为近似计算中的主要研究问题。最近,AI/ML基于设计空间探索技术来实现近似操作的方法不断增加。然而,大多数这些方法局限于使用ML基代函数来预测一系列相关设计决策的 PPA 和行为影响。虽然这种方法利用ML方法的回归能力,但它并未充分利用机器学习中的更高级方法。为此,我们提出了 AxOCS,一种通过基于机器学习的超级采样来设计近似算术操作的方法。具体来说,我们提出了一种方法,利用不同比特宽度的 Operator 之间的 PPA 和行为指标的关系来生成更大的比特宽度的 Operator。 proposed 的方法涉及穿越相对较小的设计空间,较小的比特宽度 Operator 的设计空间,并使用其相关的设计-PPA-BEHAV 关系来生成较大规模Operator 的基于启发式优化的初始解决方案。AxOCS 对于FPGA优化的近似操作实现的实验评估表明,该方法显著提高了多目标优化时8x8 sign 近似乘法的Hypervolume的质量。
https://arxiv.org/abs/2309.12830
Robot multimodal locomotion encompasses the ability to transition between walking and flying, representing a significant challenge in robotics. This work presents an approach that enables automatic smooth transitions between legged and aerial locomotion. Leveraging the concept of Adversarial Motion Priors, our method allows the robot to imitate motion datasets and accomplish the desired task without the need for complex reward functions. The robot learns walking patterns from human-like gaits and aerial locomotion patterns from motions obtained using trajectory optimization. Through this process, the robot adapts the locomotion scheme based on environmental feedback using reinforcement learning, with the spontaneous emergence of mode-switching behavior. The results highlight the potential for achieving multimodal locomotion in aerial humanoid robotics through automatic control of walking and flying modes, paving the way for applications in diverse domains such as search and rescue, surveillance, and exploration missions. This research contributes to advancing the capabilities of aerial humanoid robots in terms of versatile locomotion in various environments.
机器人的多模式行走涵盖了步行和飞行之间的平滑过渡,代表了机器人领域的一个重大挑战。这项工作提出了一种方法,可以使机器人实现自动平滑过渡,即从步行到飞行的转型。利用对抗运动先验的概念,我们的算法使机器人能够模仿运动数据集,并完成所需的任务,而不需要复杂的奖励函数。机器人从人类步态学习步行模式,从通过路径优化获得的飞行模式中学习空中行走模式。通过这个过程,机器人使用强化学习环境反馈来适应步行和飞行模式,并出现了模式切换行为。结果突出了通过自动控制步行和飞行模式实现多模式行走的潜力,为各种应用领域(如搜索和救援、监视和探索)提供了应用前景。这项工作为空中型人类机器人在各种环境中的多功能行走提供了扩展能力。
https://arxiv.org/abs/2309.12784
The bin packing is a well-known NP-Hard problem in the domain of artificial intelligence, posing significant challenges in finding efficient solutions. Conversely, recent advancements in quantum technologies have shown promising potential for achieving substantial computational speedup, particularly in certain problem classes, such as combinatorial optimization. In this study, we introduce QAL-BP, a novel Quadratic Unconstrained Binary Optimization (QUBO) formulation designed specifically for bin packing and suitable for quantum computation. QAL-BP utilizes the augmented Lagrangian method to incorporate the bin packing constraints into the objective function while also facilitating an analytical estimation of heuristic, but empirically robust, penalty multipliers. This approach leads to a more versatile and generalizable model that eliminates the need for empirically calculating instance-dependent Lagrangian coefficients, a requirement commonly encountered in alternative QUBO formulations for similar problems. To assess the effectiveness of our proposed approach, we conduct experiments on a set of bin-packing instances using a real Quantum Annealing device. Additionally, we compare the results with those obtained from two different classical solvers, namely simulated annealing and Gurobi. The experimental findings not only confirm the correctness of the proposed formulation but also demonstrate the potential of quantum computation in effectively solving the bin-packing problem, particularly as more reliable quantum technology becomes available.
装箱问题是人工智能领域的一个著名NP-困难问题,在找到高效解决方案方面面临巨大挑战。反之,最近的量子技术进步表明,可以实现显著的计算速度提升,特别是在组合优化等某些问题类中,例如。在本研究中,我们介绍了QL-BP,一个专门设计用于装箱和适合量子计算的新无约束二元优化(QUBO)方案。QL-BP使用增强拉格朗日方法将装箱约束融入目标函数中,同时还方便对启发式但经验上稳健的代价乘数进行 analytical 估计。这种方法导致更灵活和可泛化的模式,消除经验上计算实例相关拉格朗日系数的需求,这在类似问题的QUBO替代方案中经常出现。为了评估我们提出的方案的有效性,我们使用了一个真实的量子退火装置对一组装箱实例进行了实验。此外,我们与模拟退火和Gurobi两个不同的古典求解器进行了比较。实验结果不仅确认了 proposed 方案的正确性,还展示了量子计算在有效地解决装箱问题方面的潜力,尤其是当更可靠的量子技术出现时。
https://arxiv.org/abs/2309.12678
Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks.
设计和推导具有性能改进保证的基于模型的强化学习算法是一项挑战性的任务,主要原因是模型学习和政策优化之间的高耦合。许多先前方法,依赖于返回差异来指导模型学习,忽视了模型的变化影响,这可能会导致由于过度模型更新而导致性能恶化。其他方法使用性能差异限制来 explicitly 考虑模型变化。然而,这些方法依赖于一个固定的阈值来限制模型变化,因此在训练过程中表现出严重的依赖关系和缺乏适应性。在本文中,我们从理论上推导了一个可以统一模型变化和模型偏差的优化目标,然后制定了一个微调过程。这个过程自适应地调整模型更新,以实现性能改进保证,同时避免模型过拟合。基于这些,我们开发了一个简单算法USB-PO(统一模型变化和模型偏差政策优化)。经验结果显示,USB-PO在几个具有挑战性的基准任务上取得了最先进的性能。
https://arxiv.org/abs/2309.12671
The use of Implicit Neural Representation (INR) through a hash-table has demonstrated impressive effectiveness and efficiency in characterizing intricate signals. However, current state-of-the-art methods exhibit insufficient regularization, often yielding unreliable and noisy results during interpolations. We find that this issue stems from broken gradient flow between input coordinates and indexed hash-keys, where the chain rule attempts to model discrete hash-keys, rather than the continuous coordinates. To tackle this concern, we introduce RHINO, in which a continuous analytical function is incorporated to facilitate regularization by connecting the input coordinate and the network additionally without modifying the architecture of current hash-based INRs. This connection ensures a seamless backpropagation of gradients from the network's output back to the input coordinates, thereby enhancing regularization. Our experimental results not only showcase the broadened regularization capability across different hash-based INRs like DINER and Instant NGP, but also across a variety of tasks such as image fitting, representation of signed distance functions, and optimization of 5D static / 6D dynamic neural radiance fields. Notably, RHINO outperforms current state-of-the-art techniques in both quality and speed, affirming its superiority.
通过哈希表使用隐含神经网络表示(INR)已经表现出令人印象深刻的效率和效力,以形容复杂的信号。然而,当前最先进的方法表现出不足的Regularization,在插值过程中往往产生不可靠和噪声性的结果。我们发现这个问题源于输入坐标和索引哈希键之间的梯度流中断,其中链式规则试图模拟离散哈希键,而不是连续的坐标。为了解决这个问题,我们引入了 RHINO,其中引入连续分析函数以促进Regularization,通过额外连接输入坐标和网络,而无需改变当前基于哈希的INR架构。这个连接确保了从网络的输出无缝反向传播梯度,从而增强Regularization。我们的实验结果不仅展示了不同基于哈希的INR之间的 broaden Regularization能力,如Diner和Instant NGP,而且还涵盖了各种任务,如图像匹配、表示 signed 距离函数和优化5D静态/6D动态神经网络光照场。值得注意的是, RHINO在质量和速度方面都超越了当前最先进的技术,确认了其优越性。
https://arxiv.org/abs/2309.12642
One of the problems in quantitative finance that has received the most attention is the portfolio optimization problem. Regarding its solving, this problem has been approached using different techniques, with those related to quantum computing being especially prolific in recent years. In this study, we present a system called Quantum Computing-based System for Portfolio Optimization with Future Asset Values and Automatic Universe Reduction (Q4FuturePOP), which deals with the Portfolio Optimization Problem considering the following innovations: i) the developed tool is modeled for working with future prediction of assets, instead of historical values; and ii) Q4FuturePOP includes an automatic universe reduction module, which is conceived to intelligently reduce the complexity of the problem. We also introduce a brief discussion about the preliminary performance of the different modules that compose the prototypical version of Q4FuturePOP.
在量化金融中,最受关注的问题之一是投资组合优化问题。关于如何解决这一问题,已经采用了多种技术,与量子计算相关的技术尤为活跃。在本研究中,我们介绍了一个系统,称为基于量子计算的投资组合优化系统,包括未来资产价值自动宇宙减少(Q4FuturePOP)。该系统处理了投资组合优化问题,考虑了以下创新:第一,开发工具是建模用于处理未来资产预测,而不是历史价值;第二,Q4FuturePOP包括一个自动宇宙减少模块,旨在 intelligently 减少问题的复杂性。我们还介绍了关于组成Q4FuturePOP的典型版本不同模块的初步性能的简要讨论。
https://arxiv.org/abs/2309.12627
Optimization-based safety filters, such as control barrier function (CBF) based quadratic programs (QPs), have demonstrated success in controlling autonomous systems to achieve complex goals. These CBF-QPs can be shown to be continuous, but are generally not smooth, let alone continuously differentiable. In this paper, we present a general characterization of smooth safety filters -- smooth controllers that guarantee safety in a minimally invasive fashion -- based on the Implicit Function Theorem. This characterization leads to families of smooth universal formulas for safety-critical controllers that quantify the conservatism of the resulting safety filter, the utility of which is demonstrated through illustrative examples.
基于优化的安全过滤器,如控制屏障函数(CBF)基于quadratic programs(QPs)的安全控制器,已经证明可以在控制自主系统以实现复杂目标方面取得成功。这些CBF-QPs可以证明是连续的,但通常不是平滑的,更不用说连续微分了。在本文中,我们基于Implicit Function Theorem提出了一种通用的描述平滑安全过滤器的方法——以最小 invasive方式保证安全性的平滑控制器。这种方法导致了一组安全关键控制器的平滑通用公式,这些公式量化了 resulting safety filter的保守性,并使用了举例来展示其有用性。
https://arxiv.org/abs/2309.12614
Recent transportation research suggests that autonomous vehicles (AVs) have the potential to improve traffic flow efficiency as they are able to maintain smaller car-following distances. Nevertheless, being a unique class of ground robots, AVs are susceptible to robotic errors, particularly in their perception module, leading to uncertainties in their movements and an increased risk of collisions. Consequently, conservative operational strategies, such as larger headway and slower speeds, are implemented to prioritize safety over traffic capacity in real-world operations. To reconcile the inconsistency, this paper proposes an analytical model framework that delineates the endogenous reciprocity between traffic safety and efficiency that arises from robotic uncertainty in AVs. Car-following scenarios are extensively examined, with uncertain headway as the key parameter for bridging the single-lane capacity and the collision probability. A Markov chain is then introduced to describe the dynamics of the lane capacity, and the resulting expected collision-inclusive capacity is adopted as the ultimate performance measure for fully autonomous traffic. With the help of this analytical model, it is possible to support the settings of critical parameters in AV operations and incorporate optimization techniques to assist traffic management strategies for autonomous traffic.
最近的运输研究表明,自动驾驶汽车(AVs)有提高交通流效率的潜力,因为它们能够保持较小的汽车跟随距离。然而,由于它们是地面机器人的一种独特类型,容易受到机器人错误,特别是其感知模块的错误,导致他们的运动不确定性增加,Collision 风险也增加。因此,采取保守的行为方式,例如更大的出发速度和更慢的速度,在现实世界行动中优先考虑安全性而次要考虑交通容量。为了调和一致性,本文提出了一种分析模型框架,该框架描述了由 AVs 中的机器人不确定性引起的交通安全和效率之间的自适应性反循环。对汽车跟随场景进行了深入研究,以确定单一车道容量和Collision 概率之间的关键参数。然后,引入马尔可夫链来描述车道容量的动态,并采用预期的最大Collision 包容能力作为完全自动驾驶交通的终极性能指标。借助这种方法,可以支持 AV 行动中关键参数的设置,并采用优化技术协助自动驾驶交通的 traffic 管理策略。
https://arxiv.org/abs/2309.12611
This paper presents a tutorial overview of path integral (PI) control approaches for stochastic optimal control and trajectory optimization. We concisely summarize the theoretical development of path integral control to compute a solution for stochastic optimal control and provide algorithmic descriptions of the cross-entropy (CE) method, an open-loop controller using the receding horizon scheme known as the model predictive path integral (MPPI), and a parameterized state feedback controller based on the path integral control theory. We discuss policy search methods based on path integral control, efficient and stable sampling strategies, extensions to multi-agent decision-making, and MPPI for the trajectory optimization on manifolds. For tutorial demonstrations, some PI-based controllers are implemented in MATLAB and ROS2/Gazebo simulations for trajectory optimization. The simulation frameworks and source codes are publicly available at this https URL.
本论文介绍了路径积分(PI)控制方法,用于 stochastic 最优控制和路径优化。我们简要总结了路径积分控制的理论基础,计算了 stochastic 最优控制的解决方案,并提供了交叉熵方法、一种使用渐近界 scheme 的开环控制器,即模型预测路径积分(MPPI),以及基于路径积分控制理论的参数化状态反馈控制器的算法描述。我们讨论了基于路径积分控制的政策搜索方法、高效稳定的采样策略、扩展到多agent决策和MPPI在多分支路径优化中的应用。为教学演示,一些 PI 控制器在 MATLAB 和 ROS2/Gazebo 模拟器中进行实现,用于路径优化。模拟器框架和源代码在此 https URL 上公开可用。
https://arxiv.org/abs/2309.12566
We propose a risk-aware crash mitigation system (RCMS), to augment any existing motion planner (MP), that enables an autonomous vehicle to perform evasive maneuvers in high-risk situations and minimize the severity of collision if a crash is inevitable. In order to facilitate a smooth transition between RCMS and MP, we develop a novel activation mechanism that combines instantaneous as well as predictive collision risk evaluation strategies in a unified hysteresis-band approach. For trajectory planning, we deploy a modular receding horizon optimization-based approach that minimizes a smooth situational risk profile, while adhering to the physical road limits as well as vehicular actuator limits. We demonstrate the performance of our approach in a simulation environment.
我们提议了一种风险意识的崩溃缓解系统(RCMS),以补充任何现有的运动规划器(MP),使自动驾驶车辆能够在高风险情况下进行规避动作,并如果发生车祸则尽可能减轻碰撞的严重程度。为了便于RCMS和MP的平滑过渡,我们开发了一种独特的激活机制,它采用即时和预测的碰撞风险评估策略,在一个统一的Hyssible Band方法中综合应用。对于路径规划,我们采用了模块化延期 horizon 优化方法,最大限度地减少平滑情境风险概况,同时遵守物理道路限制和车辆致动限制。我们在一个模拟环境中演示了我们的 approach 的性能。
https://arxiv.org/abs/2309.12531
Throughout long history, natural species have learned to survive by evolving their physical structures adaptive to the environment changes. In contrast, current reinforcement learning (RL) studies mainly focus on training an agent with a fixed morphology (e.g., skeletal structure and joint attributes) in a fixed environment, which can hardly generalize to changing environments or new tasks. In this paper, we optimize an RL agent and its morphology through ``morphology-environment co-evolution (MECE)'', in which the morphology keeps being updated to adapt to the changing environment, while the environment is modified progressively to bring new challenges and stimulate the improvement of the morphology. This leads to a curriculum to train generalizable RL, whose morphology and policy are optimized for different environments. Instead of hand-crafting the curriculum, we train two policies to automatically change the morphology and the environment. To this end, (1) we develop two novel and effective rewards for the two policies, which are solely based on the learning dynamics of the RL agent; (2) we design a scheduler to automatically determine when to change the environment and the morphology. In experiments on two classes of tasks, the morphology and RL policies trained via MECE exhibit significantly better generalization performance in unseen test environments than SOTA morphology optimization methods. Our ablation studies on the two MECE policies further show that the co-evolution between the morphology and environment is the key to the success.
自古以来,自然物种通过适应环境变化进化出了自己的物理结构,从而存活下来。相比之下,当前强化学习(RL)研究主要关注在一个固定的环境内训练一个具有固定形态(如骨骼结构和关节属性)的代理,这很难扩展到新的环境或新的任务。在本文中,我们通过“形态-环境协同演化”(MECE)优化了RL代理及其形态,该演化过程在适应环境变化的同时逐步修改环境,以带来新的挑战并刺激形态改进。这导致了一种可以训练通用RL的 curriculum,其形态和策略分别针对不同的环境进行优化。 instead of手工制定 curriculum,我们训练了两个 policies 来自动改变形态和环境。为此,(1)我们开发了两个新的有效的奖励,这些奖励仅基于 RL 代理的学习动态;(2)我们设计了一个调度器来自动确定何时改变形态和环境。在两个任务类别的实验中,通过 MECE 训练的形态和 RL 策略在未知的测试环境中表现出比当前最佳形态优化方法更好的泛化性能。我们的两个 MECE 策略的析因研究进一步表明,形态和环境的协同演化是成功的关键。
https://arxiv.org/abs/2309.12529
This paper introduces a new approach to generating strongly constrained texts. We consider standardized sentence generation for the typical application of vision screening. To solve this problem, we formalize it as a discrete combinatorial optimization problem and utilize multivalued decision diagrams (MDD), a well-known data structure to deal with constraints. In our context, one key strength of MDD is to compute an exhaustive set of solutions without performing any search. Once the sentences are obtained, we apply a language model (GPT-2) to keep the best ones. We detail this for English and also for French where the agreement and conjugation rules are known to be more complex. Finally, with the help of GPT-2, we get hundreds of bona-fide candidate sentences. When compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD), this brings a major breakthrough in the field of standardized sentence generation. Also, as it can be easily adapted for other languages, it has the potential to make the MNREAD test even more valuable and usable. More generally, this paper highlights MDD as a convincing alternative for constrained text generation, especially when the constraints are hard to satisfy, but also for many other prospects.
本论文介绍了一种生成强烈约束文本的新方法。我们考虑将标准化句子生成应用于常见的视力筛选应用。为了解决这个问题,我们将它 formalize 为离散组合优化问题,并使用多值决策图(MDD),一种常用的数据结构来处理约束。在我们的上下文中,MDD 的一个关键优点是可以在不需要搜索的情况下计算充分的解决方案集合。一旦得到句子,我们应用语言模型(GPT-2)保留最好的句子。我们详细讨论了英语和法语,其中已知 agreement 和变位规则更为复杂。最终,通过 GPT-2,我们获得了数百个真实的候选句子。与著名的视力筛选测试(MNREAD)通常提供的数十句话相比,这在一次标准化句子生成领域中实现了重大突破。此外,由于它可以轻松地适应其他语言,它有可能使MNREAD测试更加有用和可用。更一般地说,本论文强调了 MDD 作为生成约束文本的令人信服替代品,特别是在难以满足约束的情况下,同时也适用于许多其他前景。
https://arxiv.org/abs/2309.12415
Despite large advances in recent years, real-time capable motion planning for autonomous road vehicles remains a huge challenge. In this work, we present a decision module that is based on set-based reachability analysis: First, we identify all possible driving corridors by computing the reachable set for the longitudinal position of the vehicle along the lanelets of the road network, where lane changes are modeled as discrete events. Next, we select the best driving corridor based on a cost function that penalizes lane changes and deviations from a desired velocity profile. Finally, we generate a reference trajectory inside the selected driving corridor, which can be used to guide or warm start low-level trajectory planners. For the numerical evaluation we combine our decision module with a motion-primitive-based and an optimization-based planner and evaluate the performance on 2000 challenging CommonRoad traffic scenarios as well in the realistic CARLA simulator. The results demonstrate that our decision module is real-time capable and yields significant speed-ups compared to executing a motion planner standalone without a decision module.
尽管近年来取得了巨大的进步,但对于自主汽车实时 capable 的 motion planning 仍然是一个巨大的挑战。在本文中,我们提出了基于集合 reachability 分析的决策模块:首先,我们计算出车辆沿着道路网络的车道线,将车道变化建模为离散事件,计算出所有可能的交通走廊。接下来,我们根据一个惩罚函数选择最佳的交通走廊,该函数惩罚车道变化和偏离期望的速度特性。最后,我们生成选定交通走廊内部的参考轨迹,该轨迹可用于指导或热身低级别轨迹规划师。为了进行数值评估,我们结合了运动基本方法和优化方法的决策模块,并对 2000 个挑战性的Common Road 交通场景以及真实的CARLA模拟器进行评估。结果表明,我们的决策模块具有实时能力,相比单独执行 motion planner,能够显著提高速度。
https://arxiv.org/abs/2309.12289
This work investigates a case study of using physical-based sonification of Quadratic Unconstrained Binary Optimization (QUBO) problems, optimized by the Variational Quantum Eigensolver (VQE) algorithm. The VQE approximates the solution of the problem by using an iterative loop between the quantum computer and a classical optimization routine. This work explores the intermediary statevectors found in each VQE iteration as the means of sonifying the optimization process itself. The implementation was realised in the form of a musical interface prototype named Variational Quantum Harmonizer (VQH), providing potential design strategies for musical applications, focusing on chords, chord progressions, and arpeggios. The VQH can be used both to enhance data visualization or to create artistic pieces. The methodology is also relevant in terms of how an artist would gain intuition towards achieving a desired musical sound by carefully designing QUBO cost functions. Flexible mapping strategies could supply a broad portfolio of sounds for QUBO and quantum-inspired musical compositions, as demonstrated in a case study composition, "Dependent Origination" by Peter Thomas and Paulo Itaborai.
本研究调查了一个案例研究,涉及利用基于物理的音频增强技术对经过Variational Quantum Eigensolver (VQE)算法优化的quadratic Unconstrained Binary Optimization (QUBO)问题进行音频增强。VQE使用量子计算机和经典优化算法之间的迭代循环来近似解决问题。本研究探索了在每个VQE迭代中出现的中间状态向量,将其视为优化过程本身的音频增强手段。实现形式是名为Variational Quantum Harmonizer (VQH)的音乐接口原型,为音乐应用提供了潜在设计策略,重点关注和弦、和弦进展和拨片。VQH既可以用于增强数据可视化,也可以用于创作艺术片段。研究方法也涉及到如何通过精心设计的QUBO成本函数来启发艺术家实现所需的音乐声音。灵活的映射策略可以为QUBO和量子 inspired的音乐创作提供广泛的音乐声音集,就像Peter Thomas和Paulo Itaborai创作的一个案例音乐作品《依赖的起源》所演示的那样。
https://arxiv.org/abs/2309.12254
Mobile manipulators have been employed in many applications which are usually performed by multiple fixed-base robots or a large-size system, thanks to the mobility of the mobile base. However, the mobile base also brings redundancies to the system, which makes trajectory planning more challenging. One class of problems recently arising from mobile 3D printing is the trajectory-continuous tasks, in which the end-effector is required to follow a designed continuous trajectory (time-parametrized path) in task space. This paper formulates and solves the optimal trajectory planning problem for mobile manipulators under end-effector trajectory continuity constraint, which allows considerations of other constraints and trajectory optimization. To demonstrate our method, a discrete optimal trajectory planning algorithm is proposed to solve mobile 3D printing tasks in multiple experiments.
移动操纵器被广泛应用于许多应用,这些应用通常由多个固定机器人或大型系统执行,得益于移动基地的流动性。然而,移动基地也增加了系统的复杂性,使轨迹规划更加具有挑战性。最近从移动3D打印中产生的一类问题是一个连续轨迹任务,该任务要求输出器在任务空间中遵循设计连续轨迹(时间参数化的路径)。本文定义并解决了移动操纵器在输出器轨迹连续性限制下的最优轨迹规划问题,这允许考虑其他限制和轨迹优化。为了展示我们的方法,我们提出了一种离散的最优轨迹规划算法,用于解决多个实验中的移动3D打印任务。
https://arxiv.org/abs/2309.12251
Tumor segmentation in medical imaging is crucial and relies on precise delineation. Fluorodeoxyglucose Positron-Emission Tomography (FDG-PET) is widely used in clinical practice to detect metabolically active tumors. However, FDG-PET scans may misinterpret irregular glucose consumption in healthy or benign tissues as cancer. Combining PET with Computed Tomography (CT) can enhance tumor segmentation by integrating metabolic and anatomic information. FDG-PET/CT scans are pivotal for cancer staging and reassessment, utilizing radiolabeled fluorodeoxyglucose to highlight metabolically active regions. Accurately distinguishing tumor-specific uptake from physiological uptake in normal tissues is a challenging aspect of precise tumor segmentation. The AutoPET challenge addresses this by providing a dataset of 1014 FDG-PET/CT studies, encouraging advancements in accurate tumor segmentation and analysis within the FDG-PET/CT domain. Code: this https URL
医学影像中的肿瘤分割是至关重要的,并且需要精确的分割。氟代葡萄糖正电子发射断层扫描(FDG-PET)在临床实践中广泛应用来检测代谢活跃的肿瘤。然而,FDG-PET扫描可能会将正常或良性组织中的不规则葡萄糖摄取误解为肿瘤。将PET与计算机断层扫描(CT)结合可以加强肿瘤分割,通过整合代谢和解剖信息。FDG-PET/CT扫描对于肿瘤分期和重新评估至关重要,使用放射性同位素氟代葡萄糖来强调代谢活跃的区域。准确区分肿瘤特异性摄取和正常组织中的生理摄取是精确肿瘤分割的一个挑战性方面。自动PET挑战解决这个问题,提供了1014个FDG-PET/CT研究的数据库,鼓励在FDG-PET/CT领域的精确肿瘤分割和分析方面的技术进步。代码: this https URL
https://arxiv.org/abs/2309.12114
Multi-task learning (MTL) has shown great potential in medical image analysis, improving the generalizability of the learned features and the performance in individual tasks. However, most of the work on MTL focuses on either architecture design or gradient manipulation, while in both scenarios, features are learned in a competitive manner. In this work, we propose to formulate MTL as a multi/bi-level optimization problem, and therefore force features to learn from each task in a cooperative approach. Specifically, we update the sub-model for each task alternatively taking advantage of the learned sub-models of the other tasks. To alleviate the negative transfer problem during the optimization, we search for flat minima for the current objective function with regard to features from other tasks. To demonstrate the effectiveness of the proposed approach, we validate our method on three publicly available datasets. The proposed method shows the advantage of cooperative learning, and yields promising results when compared with the state-of-the-art MTL approaches. The code will be available online.
多任务学习(MTL)在医学图像分析中表现出巨大的潜力,可以提高学习特征的泛化能力和单个任务的表现。然而,在MTL的研究中,大多数工作都关注于架构设计或梯度操纵,而在这两种情况下,特征都是以竞争的方式学习的。在本文中,我们提议将MTL建模为多/双层次优化问题,因此必须强迫每个任务从每个任务中学习特征,采用合作的方法。具体来说,我们交替更新每个任务的核心模型,利用其他任务学习的核心模型。为了减轻优化中的消极迁移问题,我们搜索当前目标函数对其他任务特征的平坦最小值。为了证明所提出的方法的有效性,我们验证我们的方法于三个公开数据集。 proposed method表明合作学习的优势,与最新的MTL方法相比,表现出令人期望的结果。代码将在网上可用。
https://arxiv.org/abs/2309.12090