While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping) and is notorious for its sensitivity to the precise implementation of these components. In response, we take a step back and ask what a minimalist RL algorithm for the era of generative models would look like. We propose REBEL, an algorithm that cleanly reduces the problem of policy optimization to regressing the relative rewards via a direct policy parameterization between two completions to a prompt, enabling strikingly lightweight implementation. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL, which allows us to match the strongest known theoretical guarantees in terms of convergence and sample complexity in the RL literature. REBEL can also cleanly incorporate offline data and handle the intransitive preferences we frequently see in practice. Empirically, we find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO, all while being simpler to implement and more computationally tractable than PPO.
最初是为连续控制问题而开发的,但Proximal Policy Optimization (PPO)现在已成为各种强化学习(RL)应用(包括对生成模型的微调)的摇钱树。不幸的是,PPO需要多个技巧来实现稳定的收敛(例如值网络,截断) ,并以其对这些组件的具体实现非常敏感而闻名。为了应对这个问题,我们回退一步并问:在生成模型时代,一个简约的RL算法会是什么样子?我们提出了REBEL,一种通过直接对两个完成之间的策略参数化来降低策略优化问题的算法。在理论方面,我们证明了诸如自然策略梯度等基本RL算法可以被视为REBEL的变体,这使我们能够匹配RL文献中关于收敛和样本复杂性的最强已知理论保证。REBEL还可以干净地整合离线数据,并处理我们经常遇到的实际问题中的自偏好。在实证研究中,我们发现REBEL在语言建模和图像生成方面的性能与PPO和DPO相当或更好,同时比PPO更简单地实现,并且具有更快的计算可处理性。
https://arxiv.org/abs/2404.16767
This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothing-disentangled 3D human models while providing control capacity for the generation process. The basic idea is progressively generating a minimal-clothed human body and layer-wise clothes. During clothing generation, a novel stratified compositional rendering method is proposed to fuse multi-layer human models, and a new loss function is utilized to help decouple the clothing model from the human body. The proposed method achieves high-quality disentanglement, which thereby provides an effective way for 3D garment generation. Extensive experiments demonstrate that our approach achieves state-of-the-art 3D clothed human generation while also supporting cloth editing applications such as virtual try-on. Project page: this http URL
本文讨论了从文本描述中生成3D带衣服的人的任务。以前的工作通常将人体和衣服编码为一个整体模型,并在一个阶段优化中生成整个模型,这使得他们在衣物编辑方面挣扎,同时失去了对整个生成过程的细粒度控制。为了解决这个问题,我们提出了一个逐层的带衣服的人表示与渐进优化策略相结合的方法,从而在生成过程中实现衣物分离的3D人体模型,并提供了对生成过程的控制能力。基本思路是逐步生成最小带衣服的人体和逐层生成衣服。在服装生成过程中,我们提出了一种新的分层组合渲染方法来融合多层人体模型,并使用新的损失函数帮助解耦服装模型与人体。所提出的方法实现了高质量的分离,从而为3D服装生成提供了一种有效的方法。大量的实验证明,我们的方法在实现最先进的3D带衣服的人生成的同时,还支持虚拟试穿等衣物编辑应用。项目页面:http:// this http URL
https://arxiv.org/abs/2404.16748
Navigating mobile robots in social environments remains a challenging task due to the intricacies of human-robot interactions. Most of the motion planners designed for crowded and dynamic environments focus on choosing the best velocity to reach the goal while avoiding collisions, but do not explicitly consider the high-level navigation behavior (avoiding through the left or right side, letting others pass or passing before others, etc.). In this work, we present a novel motion planner that incorporates topology distinct paths representing diverse navigation strategies around humans. The planner selects the topology class that imitates human behavior the best using a deep neural network model trained on real-world human motion data, ensuring socially intelligent and contextually aware navigation. Our system refines the chosen path through an optimization-based local planner in real time, ensuring seamless adherence to desired social behaviors. In this way, we decouple perception and local planning from the decision-making process. We evaluate the prediction accuracy of the network with real-world data. In addition, we assess the navigation capabilities in both simulation and a real-world platform, comparing it with other state-of-the-art planners. We demonstrate that our planner exhibits socially desirable behaviors and shows a smooth and remarkable performance.
在社交环境中导航移动机器人仍然是一个具有挑战性的任务,因为人机交互的复杂性。为了解决这个问题,大多数为拥挤和动态环境设计的运动规划器都集中于选择最佳速度以达到目标,同时避免碰撞,但这些规划器没有明确考虑高级导航行为(避免穿过左侧或右侧,让别人通过或在其前面经过等)。在本文中,我们提出了一个新颖的运动规划器,它包含了代表人类行为多样性导航策略的拓扑学不同的路径。规划器通过基于真实世界人类运动数据训练的深度神经网络模型选择最优秀的拓扑学类,确保社会智能和上下文意识导航。我们的系统通过实时优化基于拓扑的运动规划器来优化所选路径,确保无缝适应期望的社会行为。 在这种程度上,我们解耦了感知和局部规划与决策过程。我们在真实世界中评估网络的预测准确性。此外,我们还评估了该规划器在模拟和真实世界平台上的导航能力,将其与最先进的规划器进行比较。我们证明了我们的规划器表现出社会可接受的行为,表现出平滑和令人印象深刻的表现。
https://arxiv.org/abs/2404.16705
Few-shot image synthesis entails generating diverse and realistic images of novel categories using only a few example images. While multiple recent efforts in this direction have achieved impressive results, the existing approaches are dependent only upon the few novel samples available at test time in order to generate new images, which restricts the diversity of the generated images. To overcome this limitation, we propose Conditional Distribution Modelling (CDM) -- a framework which effectively utilizes Diffusion models for few-shot image generation. By modelling the distribution of the latent space used to condition a Diffusion process, CDM leverages the learnt statistics of the training data to get a better approximation of the unseen class distribution, thereby removing the bias arising due to limited number of few shot samples. Simultaneously, we devise a novel inversion based optimization strategy that further improves the approximated unseen class distribution, and ensures the fidelity of the generated samples to the unseen class. The experimental results on four benchmark datasets demonstrate the effectiveness of our proposed CDM for few-shot generation.
少量样本图像生成意味着使用仅几张示例图像生成具有新颖类别的新颖且真实的图像。虽然在这个方向上已经有很多最近的尝试取得了令人印象深刻的成果,但现有的方法仅依赖于测试时间有限的几个新颖样本生成新图像,这限制了生成的图像的多样性。为了克服这个限制,我们提出了条件分布建模(CDM)框架——一个有效利用扩散模型进行少量样本图像生成的框架。通过建模用于条件扩散过程的潜在空间分布,CDM利用训练数据的已学习统计量来获得更好的类分布近似,从而消除由于样本数量有限而产生的偏差。同时,我们还设计了一种新的基于优化的策略,进一步改善了类分布的近似程度,并确保生成的样本与原始类别的一致性。在四个基准数据集上的实验结果表明,我们提出的CDM对于少量样本生成非常有效。
https://arxiv.org/abs/2404.16556
Model-free reinforcement learning methods lack an inherent mechanism to impose behavioural constraints on the trained policies. While certain extensions exist, they remain limited to specific types of constraints, such as value constraints with additional reward signals or visitation density constraints. In this work we try to unify these existing techniques and bridge the gap with classical optimization and control theory, using a generic primal-dual framework for value-based and actor-critic reinforcement learning methods. The obtained dual formulations turn out to be especially useful for imposing additional constraints on the learned policy, as an intrinsic relationship between such dual constraints (or regularization terms) and reward modifications in the primal is reveiled. Furthermore, using this framework, we are able to introduce some novel types of constraints, allowing to impose bounds on the policy's action density or on costs associated with transitions between consecutive states and actions. From the adjusted primal-dual optimization problems, a practical algorithm is derived that supports various combinations of policy constraints that are automatically handled throughout training using trainable reward modifications. The resulting $\texttt{DualCRL}$ method is examined in more detail and evaluated under different (combinations of) constraints on two interpretable environments. The results highlight the efficacy of the method, which ultimately provides the designer of such systems with a versatile toolbox of possible policy constraints.
模型无关强化学习方法缺乏对训练后策略施加行为约束的固有机制。虽然存在某些扩展,但它们仍然局限于特定的约束类型,例如带有额外奖励信号的价值约束或访问密度约束。在这项工作中,我们试图统一这些现有技术,并使用基于价值的actor-critic强化学习方法的泛化二次框架来弥合与经典优化和控制理论之间的差距。所得到的双重形式展开在很大程度上有助于对学习到的策略施加额外的约束,因为这种双重约束(或 regularization 项)与原初在值上的约束之间揭示了一种固有的关系。此外,利用这种框架,我们能够引入一些新颖的约束类型,使得能够对策略的动作密度或连续状态和动作之间的转移成本施加限制。从调整后的原初-二次优化问题中,我们得到了一个实际算法,它在训练过程中自动处理各种策略约束。利用不同的约束组合,对两个可解释的环境进行了评估。结果表明,该方法非常有效,最终为设计这种系统的设计者提供了一个丰富的策略约束工具箱。
https://arxiv.org/abs/2404.16468
Despite the remarkable success of deep learning in medical imaging analysis, medical image segmentation remains challenging due to the scarcity of high-quality labeled images for supervision. Further, the significant domain gap between natural and medical images in general and ultrasound images in particular hinders fine-tuning models trained on natural images to the task at hand. In this work, we address the performance degradation of segmentation models in low-data regimes and propose a prompt-less segmentation method harnessing the ability of segmentation foundation models to segment abstract shapes. We do that via our novel prompt point generation algorithm which uses coarse semantic segmentation masks as input and a zero-shot prompt-able foundation model as an optimization target. We demonstrate our method on a segmentation findings task (pathologic anomalies) in ultrasound images. Our method's advantages are brought to light in varying degrees of low-data regime experiments on a small-scale musculoskeletal ultrasound images dataset, yielding a larger performance gain as the training set size decreases.
尽管在医学影像分析中深度学习的成功已经让人印象深刻,但由于高质量 labeled 图像的稀缺性,医学图像分割仍然具有挑战性。此外,自然图像和医学图像以及超声图像之间显著的领域差距会阻碍将基于自然图像训练的模型用于当前任务的微调。在这项工作中,我们解决了在低数据 regime 下分割模型的性能下降问题,并提出了一个无需提示的分割方法,利用分割基础模型的能力对抽象形状进行分割。我们通过使用粗粒度语义分割掩码作为输入和零散提示可优化目标来实现这一目标。我们在超声图像数据集上展示了我们的方法。在小的多关节超声图像数据集上进行低数据 regime 实验,各种低数据 regime 实验都表明,随着训练集大小的减小,性能提高。
https://arxiv.org/abs/2404.16325
Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To address this, we propose to augment the deep features of samples by incorporating their adversarial and anti-adversarial perturbation distributions, enabling adaptive adjustment in the learning difficulty tailored to each sample's specific characteristics. We then theoretically reveal that our augmentation process approximates the optimization of a surrogate loss function as the number of augmented copies increases indefinitely. This insight leads us to develop a meta-learning-based framework for optimizing classifiers with this novel loss, introducing the effects of augmentation while bypassing the explicit augmentation process. We conduct extensive experiments across four common biased learning scenarios: long-tail learning, generalized long-tail learning, noisy label learning, and subpopulation shift learning. The empirical results demonstrate that our method consistently achieves state-of-the-art performance, highlighting its broad adaptability.
数据增强在增强和丰富训练数据方面发挥着关键作用。然而,在各种学习场景中,特别是在具有固有数据偏见的学习场景中,持续提高模型性能仍然具有挑战性。为解决这个问题,我们提出了一种通过引入样本的对抗和抗对抗扰扰分布来增强其深度特征的方法,使得学习难度能针对每个样本的特定特征进行定制调整。然后我们理论性地揭示了,随着 augmented copy 数量的无限增加,优化 surrogate 损失函数的过程近似于无限接近。这个启示使我们开发了一个基于元学习优化类分类器的框架,在忽略显式增强过程的同时引入增强效果。我们在一个常见的偏见学习场景进行广泛的实验:长尾学习、一般长尾学习、噪音标签学习和亚聚类转移学习。实证结果表明,我们的方法 consistently实现了最先进的表现,突出了其广泛的适应性。
https://arxiv.org/abs/2404.16307
Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (e.g., a woman's photo) and a text description (e.g., "a woman is drinking water."). Existing TI2V frameworks often require costly training on video-text datasets and specific model designs for text and image conditioning. In this paper, we propose TI2V-Zero, a zero-shot, tuning-free method that empowers a pretrained text-to-video (T2V) diffusion model to be conditioned on a provided image, enabling TI2V generation without any optimization, fine-tuning, or introducing external modules. Our approach leverages a pretrained T2V diffusion foundation model as the generative prior. To guide video generation with the additional image input, we propose a "repeat-and-slide" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a video frame-by-frame starting from the provided image. To ensure temporal continuity, we employ a DDPM inversion strategy to initialize Gaussian noise for each newly synthesized frame and a resampling technique to help preserve visual details. We conduct comprehensive experiments on both domain-specific and open-domain datasets, where TI2V-Zero consistently outperforms a recent open-domain TI2V model. Furthermore, we show that TI2V-Zero can seamlessly extend to other tasks such as video infilling and prediction when provided with more images. Its autoregressive design also supports long video generation.
文本有条件图像转视频生成(TI2V)旨在从给定的图像(例如,一张女人的照片)和文本描述(例如,“一个女人在喝水”)合成一个真实的视频。现有的TI2V框架通常需要在视频文本数据集上进行昂贵的训练,并针对文本和图像条件设计特定的模型。在本文中,我们提出TI2V-Zero,一种零散拍摄、无需优化、无需微调或引入外部模块的方法,它使预训练的文本到视频(T2V)扩散模型能够根据提供的图像进行条件生成,从而实现无需任何优化、微调或引入外部模块的TI2V生成。我们的方法利用预训练的T2V扩散基础模型作为生成先验。为了在使用附加图像进行视频生成时指导视频生成,我们提出了“重复并滑动”策略,它通过调节反滤波过程来控制预冻扩散模型,使其从提供的图像合成逐帧视频。为了确保时间连续性,我们采用DDPM反向策略对每个新合成帧进行初始化,并使用插值技术帮助保留视觉细节。我们对领域特定数据集和开放数据集进行了全面的实验,其中TI2V-Zero在领域特定模型中始终表现出优异的性能。此外,我们还证明了TI2V-Zero可以在提供更多图像时无缝扩展到其他任务,如视频填充和预测。其自回归设计还支持长视频生成。
https://arxiv.org/abs/2404.16306
Complex single-objective bounded problems are often difficult to solve. In evolutionary computation methods, since the proposal of differential evolution algorithm in 1997, it has been widely studied and developed due to its simplicity and efficiency. These developments include various adaptive strategies, operator improvements, and the introduction of other search methods. After 2014, research based on LSHADE has also been widely studied by researchers. However, although recently proposed improvement strategies have shown superiority over their previous generation's first performance, adding all new strategies may not necessarily bring the strongest performance. Therefore, we recombine some effective advances based on advanced differential evolution variants in recent years and finally determine an effective combination scheme to further promote the performance of differential evolution. In this paper, we propose a strategy recombination and reconstruction differential evolution algorithm called reconstructed differential evolution (RDE) to solve single-objective bounded optimization problems. Based on the benchmark suite of the 2024 IEEE Congress on Evolutionary Computation (CEC2024), we tested RDE and several other advanced differential evolution variants. The experimental results show that RDE has superior performance in solving complex optimization problems.
复杂单个目标有界问题往往很难解决。在进化计算方法中,自1997年差别进化算法(Differential Evolution,DE)的提出以来,因为它简单而高效,所以得到了广泛研究和开发。这些发展包括各种自适应策略、操作器改进以及引入其他搜索方法。自2014年以来,基于LSHADE的研究也得到了广泛研究。然而,尽管最近提出的改进策略在以前一代中的首次表现上显示出优越性,但添加所有新的策略不一定会带来最强的性能。因此,我们根据近年来基于先进差别进化变体的有效进展重新组合了一些策略,最后确定了一种有效的组合方案,进一步促进差别进化算法的性能。在本文中,我们提出了一个名为重构差别进化(RDE)的策略重组和重构差别进化算法,用于解决单个目标有界优化问题。基于2024年IEEE Congress on Evolutionary Computation(CEC2024)的基准集,我们测试了RDE以及其他几种高级差别进化算法。实验结果表明,RDE在解决复杂优化问题方面具有优越性能。
https://arxiv.org/abs/2404.16280
As one of the emerging challenges in Automated Machine Learning, the Hardware-aware Neural Architecture Search (HW-NAS) tasks can be treated as black-box multi-objective optimization problems (MOPs). An important application of HW-NAS is real-time semantic segmentation, which plays a pivotal role in autonomous driving scenarios. The HW-NAS for real-time semantic segmentation inherently needs to balance multiple optimization objectives, including model accuracy, inference speed, and hardware-specific considerations. Despite its importance, benchmarks have yet to be developed to frame such a challenging task as multi-objective optimization. To bridge the gap, we introduce a tailored streamline to transform the task of HW-NAS for real-time semantic segmentation into standard MOPs. Building upon the streamline, we present a benchmark test suite, CitySeg/MOP, comprising fifteen MOPs derived from the Cityscapes dataset. The CitySeg/MOP test suite is integrated into the EvoXBench platform to provide seamless interfaces with various programming languages (e.g., Python and MATLAB) for instant fitness evaluations. We comprehensively assessed the CitySeg/MOP test suite on various multi-objective evolutionary algorithms, showcasing its versatility and practicality. Source codes are available at this https URL.
作为自动机器学习领域新兴挑战之一,硬件感知的神经架构搜索(HW-NAS)任务可以被视为多目标优化问题(MOPs)。HW-NAS在实时语义分割(Real-time Semantic Segmentation,RSS)中的应用至关重要。为实现实时语义分割,HW-NAS在实时语义分割本身就需要在多个优化目标之间取得平衡,包括模型准确性、推理速度和硬件特定考虑。尽管HW-NAS在实时语义分割中具有重要性,但迄今为止还没有为这样的具有挑战性的任务开发基准。为了填补这一空白,我们引入了一个定制的流线来将HW-NAS在实时语义分割中的任务转化为标准的MOP。在此基础上,我们提出了一个基准测试套件,CitySeg/MOP,包含来自Cityscapes数据集的15个MOP。CitySeg/MOP测试套件已集成到EvoXBench平台中,为各种编程语言(例如Python和MATLAB)提供了一个无缝的界面来进行即时的健身评估。我们对CitySeg/MOP测试套件进行了全面评估,展示了其多才性和实用性。源代码可以从此链接获取:https://url.cn/xyz6uJ4
https://arxiv.org/abs/2404.16266
Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architecture for Neuromorphic Computing (RANC) is one such tool that offers ability to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem through both software-based simulation and FPGA-based emulation. RANC has been utilized by the community with its flexible and highly parameterized design to study implementation bottlenecks, tune architectural parameters or modify neuron behavior based on application insights and study the trade space on hardware performance and network accuracy. In designing architectures for use in neuromorphic computing, there are an incredibly large number of configuration parameters such as number and precision of weights per neuron, neuron and axon counts per core, network topology, and neuron behavior. To accelerate such studies and provide users with a streamlined productive design space exploration, in this paper we introduce the GPU-based implementation of RANC. We summarize our parallelization approach and quantify the speedup gains achieved with GPU-based tick-accurate simulations across various use cases. We demonstrate up to 780 times speedup compared to serial version of the RANC simulator based on a 512 neuromorphic core MNIST inference application. We believe that the RANC ecosystem now provides a much more feasible avenue in the research of exploring different optimizations for accelerating SNNs and performing richer studies by enabling rapid convergence to optimized neuromorphic architectures.
开源模拟工具在神经形态应用工程师和硬件架构师探究性能瓶颈和探索设计优化之前提交硅片方面发挥着关键作用。可重构架构神经形态计算(RANC)是一种这样的工具,它允许在统一的生态系统中通过软件模拟和FPGA仿真执行预训练的Spiking神经网络(SNN)模型。RANC已经通过社区灵活且参数化的设计得到了广泛应用,以研究实现瓶颈、调整架构参数或根据应用洞察力修改神经元行为,并研究硬件性能和网络准确性的贸易空间。 在为神经形态计算设计架构时,有数以百万计的配置参数,如每个神经元的权重数量和精度、每个核心的神经元和轴数量、网络拓扑结构和神经元行为。为了加速这些研究并为用户提供更简便的生产设计空间探索,本文我们引入了基于GPU的RANC实现。我们总结了我们的并行方法,并定量了GPU-based tick-accurate仿真在各种用例中的速度提升。我们证明了基于512个神经形态核心的MNIST推理应用程序的串行版本与基于GPU的RANC仿真器之间的速度提升高达780倍。我们相信,RANC生态系统现在为研究探索不同的优化方法加速SNNs和进行更丰富的研究提供了更加可行的方式,通过使快速收敛到优化神经形态架构而努力。
https://arxiv.org/abs/2404.16208
Recent work has developed optimization procedures to find token sequences, called adversarial triggers, which can elicit unsafe responses from aligned language models. These triggers are believed to be universally transferable, i.e., a trigger optimized on one model can jailbreak other models. In this paper, we concretely show that such adversarial triggers are not universal. We extensively investigate trigger transfer amongst 13 open models and observe inconsistent transfer. Our experiments further reveal a significant difference in robustness to adversarial triggers between models Aligned by Preference Optimization (APO) and models Aligned by Fine-Tuning (AFT). We find that APO models are extremely hard to jailbreak even when the trigger is optimized directly on the model. On the other hand, while AFT models may appear safe on the surface, exhibiting refusals to a range of unsafe instructions, we show that they are highly susceptible to adversarial triggers. Lastly, we observe that most triggers optimized on AFT models also generalize to new unsafe instructions from five diverse domains, further emphasizing their vulnerability. Overall, our work highlights the need for more comprehensive safety evaluations for aligned language models.
近年来,研究者们开发了寻找令牌序列的优化方法,称为对抗性触发器,这些触发器可以从对齐的语言模型中引起不安全的反应。这些触发器被认为具有普遍可转移性,即在一种模型上优化的触发器可以解锁其他模型。在本文中,我们明确地证明了这种普遍可转移的对抗性触发器并不存在。我们深入研究了13个开源模型之间的触发器传递,并观察到不一致的传递。我们的实验进一步揭示了使用偏好优化(APO)模型和 Fine-Tuning(FT)模型对 adversarial 触发器的鲁棒性差异。我们发现,即使 APO 模型直接优化触发器,也很难被破解。另一方面,虽然 AFT 模型在表面上看起来非常安全,对各种不安全的指令表现出拒绝,但我们发现它们对 adversarial 触发器非常敏感。最后,我们观察到,大多数在 AFT 模型上优化的触发器也适用于来自五个不同领域的全新不安全指令,这进一步突显了它们的脆弱性。总体而言,我们的工作强调了对于对齐语言模型的更全面的安全性评估的必要性。
https://arxiv.org/abs/2404.16020
In this paper, we propose a quantum computing oriented benchmark for combinatorial optimization. This benchmark, coined as QOPTLib, is composed of 40 instances equally distributed over four well-known problems: Traveling Salesman Problem, Vehicle Routing Problem, one-dimensional Bin Packing Problem and the Maximum Cut Problem. The sizes of the instances in QOPTLib not only correspond to computationally addressable sizes, but also to the maximum length approachable with non-zero likelihood of getting a good result. In this regard, it is important to highlight that hybrid approaches are also taken into consideration. Thus, this benchmark constitutes the first effort to provide users a general-purpose dataset. Also in this paper, we introduce a first full solving of QOPTLib using two solvers based on quantum annealing. Our main intention with this is to establish a preliminary baseline, hoping to inspire other researchers to beat these outcomes with newly proposed quantum-based algorithms.
在本文中,我们提出了一个面向量子计算的组合优化基准。这个基准被称为QOPTLib,由40个实例组成,均匀分布于四个著名问题:旅行商问题、车辆路由问题、一维 Bin Packing 问题和最大剪枝问题。QOPTLib中的实例大小不仅对应于计算可处理的大小,而且对应于非零可能性获得良好结果的最大长度。在这方面,需要强调的是,也考虑了混合方法。因此,这个基准构成了为用户提供通用数据集的第一步努力。此外,本文我们还介绍了使用量子退火方法解决QOPTLib的第一个完整解决方案。我们主要希望建立一个初步基准,希望激励其他研究人员通过新提出的基于量子的算法超越这些结果。
https://arxiv.org/abs/2404.15852
Quality-Diversity (QD) approaches are a promising direction to develop open-ended processes as they can discover archives of high-quality solutions across diverse niches. While already successful in many applications, QD approaches usually rely on combining only one or two solutions to generate new candidate solutions. As observed in open-ended processes such as technological evolution, wisely combining large diversity of these solutions could lead to more innovative solutions and potentially boost the productivity of QD search. In this work, we propose to exploit the pattern-matching capabilities of generative models to enable such efficient solution combinations. We introduce In-context QD, a framework of techniques that aim to elicit the in-context capabilities of pre-trained Large Language Models (LLMs) to generate interesting solutions using the QD archive as context. Applied to a series of common QD domains, In-context QD displays promising results compared to both QD baselines and similar strategies developed for single-objective optimization. Additionally, this result holds across multiple values of parameter sizes and archive population sizes, as well as across domains with distinct characteristics from BBO functions to policy search. Finally, we perform an extensive ablation that highlights the key prompt design considerations that encourage the generation of promising solutions for QD.
质量多样性(QD)方法是开发开放性过程的有前途的方向,因为它们可以揭示不同领域的优质解决方案。虽然已经在许多应用中取得成功,但QD方法通常仅结合一个或两个解决方案来生成新的候选解决方案。如开放性过程(如技术进步)中所观察到的,明智地结合这些解决方案的大多样性可能会导致更具创新性的解决方案,并可能提高QD搜索的生产力。在这项工作中,我们提出了利用生成模型的模式匹配能力来实现这种有效的解决方案组合。我们引入了In-Context QD,一个旨在通过将预训练的大语言模型(LLMs)在QD档案中生成有趣解决方案的技术框架。将In-Context QD应用于一系列常见的QD领域,与QD基线和为单目标优化 similar strategies 相比,显示出积极的结果。此外,这一结果在参数大小和档案人口大小不同的领域以及具有不同特性的领域中均成立。最后,我们进行了一项广泛的消融,重点突出了鼓励QD生成有前途的解决方案的关键提示设计考虑。
https://arxiv.org/abs/2404.15794
This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of the stripe effect and under noisy transmission conditions. The RTCS network features a simplified architecture that reduces the required training samples and allows for easy implementation on integer-8-based encoders, facilitating rapid compressed sensing for stripe-like HSI, which exactly matches the moderate design of miniaturized satellites on push broom scanning mechanism. This contrasts optimization-based models that demand high-precision floating-point operations, making them difficult to deploy on edge devices. Our encoder employs an integer-8-compatible linear projection for stripe-like HSI data transmission, ensuring real-time compressed sensing. Furthermore, based on the novel two-streamed architecture, an efficient HSI restoration decoder is proposed for the receiver side, allowing for edge-device reconstruction without needing a sophisticated central server. This is particularly crucial as an increasing number of miniaturized satellites necessitates significant computing resources on the ground station. Extensive experiments validate the superior performance of our approach, offering new and vital capabilities for existing miniaturized satellite systems.
本文讨论了从微型卫星中恢复超光谱图像(HSI)面临的挑战,这些卫星通常受到条带效应的影响,并且计算资源有限。我们提出了一个轻量级的实时压缩感知(RTCS)网络,旨在实现高效且在条带效应和噪声传输条件下具有鲁棒性的HSI重构。RTCS网络具有简化架构,减少了所需的训练样本,并使条带型HSI的压缩感知变得容易,与迷你火箭扫描机制上的中等设计完全吻合。这 contrasts 基于优化的模型,这些模型需要高精度的浮点运算,使得它们难以在边缘设备上部署。我们的编码器采用了一个兼容整数8的线性投影来传输条带型HSI数据,实现实时压缩感知。此外,根据新颖的双流架构,在接收端提出了一种高效的HSI恢复解码器,允许在没有复杂中央服务器的情况下实现边缘设备重建。随着越来越多的微型卫星需要地面站的大量计算资源,这种方法的重要性也越来越突出。大量实验验证了我们的方法的优越性能,为现有的微型卫星系统提供了新的和至关重要的功能。
https://arxiv.org/abs/2404.15781
Face Recognition Systems (FRS) are widely used in commercial environments, such as e-commerce and e-banking, owing to their high accuracy in real-world conditions. However, these systems are vulnerable to facial morphing attacks, which are generated by blending face color images of different subjects. This paper presents a new method for generating 3D face morphs from two bona fide point clouds. The proposed method first selects bona fide point clouds with neutral expressions. The two input point clouds were then registered using a Bayesian Coherent Point Drift (BCPD) without optimization, and the geometry and color of the registered point clouds were averaged to generate a face morphing point cloud. The proposed method generates 388 face-morphing point clouds from 200 bona fide subjects. The effectiveness of the method was demonstrated through extensive vulnerability experiments, achieving a Generalized Morphing Attack Potential (G-MAP) of 97.93%, which is superior to the existing state-of-the-art (SOTA) with a G-MAP of 81.61%.
面部识别系统(FRS)在商业环境中(如电子商务和电子银行)得到了广泛应用,因为它们在现实情况下的准确度高。然而,这些系统容易受到由不同主题混合生成面部颜色图像的变形攻击。本文提出了一种从两个真实点云生成3D面部变形的方法。与优化无关,两个输入点云使用贝叶斯一致性点漂移(BCPD)进行注册,然后平均几何和颜色生成面部变形点云。该方法从200个真实主题中生成388个面部变形点云。通过广泛的漏洞实验,该方法的有效性得到了证明,实现了97.93%的泛化形态攻击潜力(G-MAP),远高于现有状态下的81.61%。
https://arxiv.org/abs/2404.15765
Cooperative Adaptive Cruise Control (CACC) represents a quintessential control strategy for orchestrating vehicular platoon movement within Connected and Automated Vehicle (CAV) systems, significantly enhancing traffic efficiency and reducing energy consumption. In recent years, the data-driven methods, such as reinforcement learning (RL), have been employed to address this task due to their significant advantages in terms of efficiency and flexibility. However, the delay issue, which often arises in real-world CACC systems, is rarely taken into account by current RL-based approaches. To tackle this problem, we propose a Delay-Aware Multi-Agent Reinforcement Learning (DAMARL) framework aimed at achieving safe and stable control for CACC. We model the entire decision-making process using a Multi-Agent Delay-Aware Markov Decision Process (MADA-MDP) and develop a centralized training with decentralized execution (CTDE) MARL framework for distributed control of CACC platoons. An attention mechanism-integrated policy network is introduced to enhance the performance of CAV communication and decision-making. Additionally, a velocity optimization model-based action filter is incorporated to further ensure the stability of the platoon. Experimental results across various delay conditions and platoon sizes demonstrate that our approach consistently outperforms baseline methods in terms of platoon safety, stability and overall performance.
合作自适应巡航控制(CACC)代表了一种在连接和自动驾驶车辆(CAV)系统中协调车辆编队运动的典型控制策略,显著提高了交通效率和降低了能源消耗。近年来,数据驱动的方法,如强化学习(RL),已经被采用来解决这个任务,因为它们在效率和灵活性方面具有显著优势。然而,当前基于RL的方法很少考虑到实世界CACC系统中经常出现的延迟问题。为了解决这个问题,我们提出了一个针对延迟敏感的多代理器强化学习(DAMARL)框架,旨在实现CACC的安全和稳定控制。我们使用多代理器延迟感知马尔可夫决策过程(MADA-MDP)来建模整个决策过程,并开发了一种集中训练和分布式执行(CTDE)的MARL框架,用于分布式控制CACC编队。引入了注意机制的策略网络,以提高CAV通信和决策的性能。此外,还引入了基于速度优化模型的动作滤波器,进一步确保编队的稳定性。在不同的延迟条件和编队大小等实验条件下,我们发现,我们的方法在编队安全、稳定和整体性能方面 consistently超过了基线方法。
https://arxiv.org/abs/2404.15696
Generative pre-trained transformers (GPT's) are a type of large language machine learning model that are unusually adept at producing novel, and coherent, natural language. In this study the ability of GPT models to generate novel and correct versions, and notably very insecure versions, of implementations of the cryptographic hash function SHA-1 is examined. The GPT models Llama-2-70b-chat-h, Mistral-7B-Instruct-v0.1, and zephyr-7b-alpha are used. The GPT models are prompted to re-write each function using a modified version of the localGPT framework and langchain to provide word embedding context of the full source code and header files to the model, resulting in over 130,000 function re-write GPT output text blocks, approximately 40,000 of which were able to be parsed as C code and subsequently compiled. The generated code is analyzed for being compilable, correctness of the algorithm, memory leaks, compiler optimization stability, and character distance to the reference implementation. Remarkably, several generated function variants have a high implementation security risk of being correct for some test vectors, but incorrect for other test vectors. Additionally, many function implementations were not correct to the reference algorithm of SHA-1, but produced hashes that have some of the basic characteristics of hash functions. Many of the function re-writes contained serious flaws such as memory leaks, integer overflows, out of bounds accesses, use of uninitialised values, and compiler optimization instability. Compiler optimization settings and SHA-256 hash checksums of the compiled binaries are used to cluster implementations that are equivalent but may not have identical syntax - using this clustering over 100,000 novel and correct versions of the SHA-1 codebase were generated where each component C function of the reference implementation is different from the original code.
生成预训练的变换器(GPT)是一种大型自然语言机器学习模型,特别擅长生成新颖且连贯的自然语言。在这项研究中,研究了 GPT 模型在生成新颖且正确的 SHA-1 哈希函数实现版本方面的能力,特别是非常不安全的实现版本。所使用的 GPT 模型包括 LLama-2-70b-chat-h、Mistral-7B-Instruct-v0.1 和 zephyr-7b-alpha。这些 GPT 模型使用修改的 localGPT 框架和 langchain,在每个函数上生成词嵌入上下文,并提供完整的源代码和头文件给模型,导致超过 130,000 个函数重写 GPT 输出文本块,其中大约 40,000 个被解析为 C 代码并后续编译。生成的代码被分析是否可编译、算法的正确性、内存泄漏、编译优化稳定性以及与参考实现的字符距离。值得注意的是,几个生成的函数变体在某些测试数据上的实现安全性非常高,但在其他测试数据上的实现是不正确的。此外,许多函数实现与 SHA-1 参考算法不正确,但生成了具有哈希函数的一些基本特征的哈希值。许多函数重写包含严重的漏洞,如内存泄漏、整数溢出、越界访问、使用未初始化值以及编译优化不稳定。编译优化设置和编译二进制文件的 SHA-256 哈希值检查用于将具有等效但可能不具有相同语法的实现聚类在一起 - 使用这种聚类在 SHA-1 代码库上生成了超过 100,000 个新颖且正确的函数版本,其中每个参考实现组件的 C 函数与原始代码不同。
https://arxiv.org/abs/2404.15681
Reinforcement learning (RL) with continuous state and action spaces remains one of the most challenging problems within the field. Most current learning methods focus on integral identities such as value functions to derive an optimal strategy for the learning agent. In this paper, we instead study the dual form of the original RL formulation to propose the first differential RL framework that can handle settings with limited training samples and short-length episodes. Our approach introduces Differential Policy Optimization (DPO), a pointwise and stage-wise iteration method that optimizes policies encoded by local-movement operators. We prove a pointwise convergence estimate for DPO and provide a regret bound comparable with current theoretical works. Such pointwise estimate ensures that the learned policy matches the optimal path uniformly across different steps. We then apply DPO to a class of practical RL problems which search for optimal configurations with Lagrangian rewards. DPO is easy to implement, scalable, and shows competitive results on benchmarking experiments against several popular RL methods.
强化学习(RL)在具有连续状态和动作空间的情况下仍然是最具挑战性的问题之一。大多数现有学习方法都关注于全局等式,如价值函数,以得出学习代理的最优策略。在本文中,我们研究了原始RL公式的对偶形式,以提出第一个可以处理训练样本有限且历时较短的场景的DRL框架。我们的方法引入了差分策略优化(DPO),这是一种局部运动操作符编码的点间迭代方法。我们证明了DPO的点间收敛估计,并提供了一个与当前理论工作相当的后悔边界。这样的点间估计确保了学习到的策略在不同的步骤上与最优路径保持一致。然后我们将DPO应用于一类使用拉格朗兰奖励寻找最优配置的实践RL问题中。DPO易于实现,具有可扩展性,在基准实验中与几种流行RL方法竞争,表现优异。
https://arxiv.org/abs/2404.15617
This paper proposes a decentralized trajectory planning framework for the collision avoidance problem of multiple micro aerial vehicles (MAVs) in environments with static and dynamic obstacles. The framework utilizes spatiotemporal occupancy grid maps (SOGM), which forecast the occupancy status of neighboring space in the near future, as the environment representation. Based on this representation, we extend the kinodynamic A* and the corridor-constrained trajectory optimization algorithms to efficiently tackle static and dynamic obstacles with arbitrary shapes. Collision avoidance between communicating robots is integrated by sharing planned trajectories and projecting them onto the SOGM. The simulation results show that our method achieves competitive performance against state-of-the-art methods in dynamic environments with different numbers and shapes of obstacles. Finally, the proposed method is validated in real experiments.
本文提出了一种分散式轨迹规划框架,用于解决具有静态和动态障碍物的环境中多个微型无人飞行器(MAVs)的碰撞避免问题。该框架利用了静态和动态占用网格图(SOGM),将预测周围空间邻居的占用状态作为环境表示。基于此表示,我们将动量惯性算法(Kinodynamic A*)和约束跟踪优化算法(Corridor-Constrained Trajectory Optimization)扩展到能够有效处理具有任意形状的静态和动态障碍物。通过共享计划轨迹并将其投影到SOGM,将碰撞避免集成到通信机器人之间。仿真结果表明,与其他方法相比,我们的方法在具有不同数量和形状的障碍物的动态环境中实现了竞争性的性能。最后,所提出的技术在实际实验中得到了验证。
https://arxiv.org/abs/2404.15602