We introduce a new system for Multi-Session SLAM, which tracks camera motion across multiple disjoint videos under a single global reference. Our approach couples the prediction of optical flow with solver layers to estimate camera pose. The backbone is trained end-to-end using a novel differentiable solver for wide-baseline two-view pose. The full system can connect disjoint sequences, perform visual odometry, and global optimization. Compared to existing approaches, our design is accurate and robust to catastrophic failures. Code is available at this http URL
我们介绍了一种新的多会话SLAM系统,该系统在单个全局参考下跟踪相机运动。我们的方法将预测光流与求解层相结合来估计相机姿态。骨架使用一种新的具有差分隐私的求解器进行端到端训练,用于估计宽基线两视图姿态。完整的系统可以连接离散序列,执行视觉姿态估计和全局优化。与现有方法相比,我们的设计准确且对灾难性故障具有鲁棒性。代码可在此处下载:http://www.example.com
https://arxiv.org/abs/2404.15263
This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence. Our method performs per-video gradient-descent minimization of a simple least-squares objective that compares the optical flow induced by depth, intrinsics, and poses against correspondences obtained via off-the-shelf optical flow and point tracking. Alongside the use of point tracks to encourage long-term geometric consistency, we introduce differentiable re-parameterizations of depth, intrinsics, and pose that are amenable to first-order optimization. We empirically show that camera parameters and dense depth recovered by our method enable photo-realistic novel view synthesis on 360-degree trajectories using Gaussian Splatting. Our method not only far outperforms prior gradient-descent based bundle adjustment methods, but surprisingly performs on par with COLMAP, the state-of-the-art SfM method, on the downstream task of 360-degree novel view synthesis (even though our method is purely gradient-descent based, fully differentiable, and presents a complete departure from conventional SfM).
本文介绍了FlowMap,一种端到端的不同iable方法,用于求解视频序列中的精确相机姿态、相机内参和逐帧密集深度。我们的方法通过简单最小二乘目标函数对深度、内参和姿态引起的光学流进行逐视频梯度下降最小化。在点跟踪的使用下,我们引入了可进行一级优化的深度、内参和姿态的可导性重新参数化。我们通过实验验证,我们的方法能够使用高斯平铺实现照片现实感的360度轨迹合成。与基于梯度的 bundle adjustment 方法相比,我们的方法不仅远远超过了先前的结果,而且与最先进的SfM方法COLMAP在360度新视图合成下游任务的表现相当。尽管我们的方法是基于梯度的,完全不同导,完全与传统SfM不同,但它成功地克服了传统SfM的局限性。
https://arxiv.org/abs/2404.15259
Replicating the remarkable athleticism seen in animals has long been a challenge in robotics control. Although Reinforcement Learning (RL) has demonstrated significant progress in dynamic legged locomotion control, the substantial sim-to-real gap often hinders the real-world demonstration of truly dynamic movements. We propose a new framework to mitigate this gap through frequency-domain analysis-based impedance matching between simulated and real robots. Our framework offers a structured guideline for parameter selection and the range for dynamics randomization in simulation, thus facilitating a safe sim-to-real transfer. The learned policy using our framework enabled jumps across distances of 55 cm and heights of 38 cm. The results are, to the best of our knowledge, one of the highest and longest running jumps demonstrated by an RL-based control policy in a real quadruped robot. Note that the achieved jumping height is approximately 85% of that obtained from a state-of-the-art trajectory optimization method, which can be seen as the physical limit for the given robot hardware. In addition, our control policy accomplished stable walking at speeds up to 2 m/s in the forward and backward directions, and 1 m/s in the sideway direction.
复制动物在运动中的惊人 athletic 性一直是一个挑战,尤其是在机器人控制领域。虽然强化学习 (RL) 在动态腿履带运动控制方面取得了显著的进步,但巨大的模拟与现实之间的差距通常会阻碍在现实世界中真正动态运动的演示。我们提出了一种新的框架,通过基于频域分析的模拟与现实机器人之间的阻尼匹配来缓解这个差距。我们的框架为参数选择和动态随机化在模拟中的范围提供了结构化的指导,从而促进了安全的模拟到实体的转移。使用我们框架学习到的策略,跳跃距离达到了55厘米,高度达到了38厘米。据我们所知,这是基于 RL 的控制策略在实心四足机器人中实现的最高和最长的跳跃。需要注意的是,所达到的跳跃高度大约是先进轨迹优化方法得到的结果的85%,可以看出这是给定机器人硬件的物理极限。此外,我们的控制策略在前进和后退方向上实现了稳定的步行,速度达到2米/秒,而在侧面方向上实现了1米/秒的步行。
https://arxiv.org/abs/2404.15096
Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in training and inference costs, but exhibits the following two issues: (1) Low expert activation, where only a small subset of experts are activated for optimization. (2) Lacking fine-grained analytical capabilities for multiple semantic concepts within individual tokens. We propose Multi-Head Mixture-of-Experts (MH-MoE), which employs a multi-head mechanism to split each token into multiple sub-tokens. These sub-tokens are then assigned to and processed by a diverse set of experts in parallel, and seamlessly reintegrated into the original token form. The multi-head mechanism enables the model to collectively attend to information from various representation spaces within different experts, while significantly enhances expert activation, thus deepens context understanding and alleviate overfitting. Moreover, our MH-MoE is straightforward to implement and decouples from other SMoE optimization methods, making it easy to integrate with other SMoE models for enhanced performance. Extensive experimental results across three tasks: English-focused language modeling, Multi-lingual language modeling and Masked multi-modality modeling tasks, demonstrate the effectiveness of MH-MoE.
Sparse Mixtures of Experts (SMoE)是一种在不显著增加训练和推理成本的情况下扩展模型容量的方法,但存在以下两个问题:(1)专家激活度低,仅激活一小部分专家进行优化;(2)对多个语义概念的细粒度分析能力不足。我们提出了多头专家混合专家(MH-MoE)方法,它采用一个多头机制将每个词划分为多个子词。这些子词随后被分配给多个专家并并行处理,无缝地重新整合到原始词形式中。多头机制使模型能够集体关注不同专家对各个表示空间的信息,从而显著增强专家激活,加深上下文理解,缓解过拟合。此外,我们的MH-MoE易于实现,与其他SMoE优化方法解耦,容易与其他SMoE模型集成以提高性能。在三个任务(英语关注语言建模、多语言语言建模和遮罩多模态建模)上的实验结果表明,MH-MoE的有效性。
https://arxiv.org/abs/2404.15045
This paper focuses on a very important societal challenge of water quality analysis. Being one of the key factors in the economic and social development of society, the provision of water and ensuring its quality has always remained one of the top priorities of public authorities. To ensure the quality of water, different methods for monitoring and assessing the water networks, such as offline and online surveys, are used. However, these surveys have several limitations, such as the limited number of participants and low frequency due to the labor involved in conducting such surveys. In this paper, we propose a Natural Language Processing (NLP) framework to automatically collect and analyze water-related posts from social media for data-driven decisions. The proposed framework is composed of two components, namely (i) text classification, and (ii) topic modeling. For text classification, we propose a merit-fusion-based framework incorporating several Large Language Models (LLMs) where different weight selection and optimization methods are employed to assign weights to the LLMs. In topic modeling, we employed the BERTopic library to discover the hidden topic patterns in the water-related tweets. We also analyzed relevant tweets originating from different regions and countries to explore global, regional, and country-specific issues and water-related concerns. We also collected and manually annotated a large-scale dataset, which is expected to facilitate future research on the topic.
本论文重点关注水质量分析这一重要的社会挑战。作为社会经济发展的重要因素,提供水资源并确保其质量始终是公共当局的头等大事。为了确保水质,采用了一些监测和评估水网络的方法,例如离线和在线调查。然而,这些调查存在一些局限性,例如参与人数有限和调查频率较低,因为这些调查需要大量的人力投入。在本文中,我们提出了一个自然语言处理(NLP)框架,用于自动收集和分析社交媒体上的与水相关的帖子,以支持数据驱动的决策。所提出的框架由两个组成部分组成,即(i)文本分类和(ii)主题建模。 在文本分类方面,我们提出了一个基于 merits-fusion 的框架,其中包含多个 large language models (LLMs)。为了给 LLMs 分配权重,我们采用了一些权重选择和优化方法。在主题建模方面,我们使用了 BERTopic 库来发现水相关微博中的隐藏主题模式。我们还分析了许多不同地区和国家的相关 tweets,以探索全球、区域和国家特定的水和相关问题。 我们还收集并手动标注了一个大规模数据集,预计将促进未来关于这个主题的研究。
https://arxiv.org/abs/2404.14977
Weakly supervised segmentation methods have gained significant attention due to their ability to reduce the reliance on costly pixel-level annotations during model training. However, the current weakly supervised nuclei segmentation approaches typically follow a two-stage pseudo-label generation and network training process. The performance of the nuclei segmentation heavily relies on the quality of the generated pseudo-labels, thereby limiting its effectiveness. This paper introduces a novel domain-adaptive weakly supervised nuclei segmentation framework using cross-task interaction strategies to overcome the challenge of pseudo-label generation. Specifically, we utilize weakly annotated data to train an auxiliary detection task, which assists the domain adaptation of the segmentation network. To enhance the efficiency of domain adaptation, we design a consistent feature constraint module integrating prior knowledge from the source domain. Furthermore, we develop pseudo-label optimization and interactive training methods to improve the domain transfer capability. To validate the effectiveness of our proposed method, we conduct extensive comparative and ablation experiments on six datasets. The results demonstrate the superiority of our approach over existing weakly supervised approaches. Remarkably, our method achieves comparable or even better performance than fully supervised methods. Our code will be released in this https URL.
弱监督分割方法因其在模型训练过程中减少对昂贵像素级注释的依赖而受到广泛关注。然而,当前的弱监督核分割方法通常遵循两个阶段的伪标签生成和网络训练过程。核分割的表现很大程度上取决于生成的伪标签的质量,从而限制了其有效性的提高。本文提出了一种使用跨任务交互策略的新颖领域自适应弱监督核分割框架,以克服伪标签生成的挑战。具体来说,我们利用弱标注数据来训练辅助检测任务,从而帮助分割网络的领域适应。为了提高领域适应的效率,我们设计了一个一致的特征约束模块,整合了源域的知识。此外,我们还开发了伪标签优化和交互训练方法,以提高领域转移能力。为了验证我们提出的方法的有效性,我们在六个数据集上进行了广泛的比较和消融实验。结果表明,与现有弱监督方法相比,我们的方法具有优越性。值得注意的是,我们的方法甚至可能实现与完全监督方法相媲美的或更好的性能。我们的代码将在此处发布:https://URL。
https://arxiv.org/abs/2404.14956
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action Recognition (OVAR) recently gains increasing attention, with the development of vision-language pre-trainings. To enable generalization of arbitrary classes, existing methods treat class labels as text descriptions, then formulate OVAR as evaluating embedding similarity between visual samples and textual classes. However, one crucial issue is completely ignored: the class descriptions given by users may be noisy, e.g., misspellings and typos, limiting the real-world practicality of vanilla OVAR. To fill the research gap, this paper pioneers to evaluate existing methods by simulating multi-level noises of various types, and reveals their poor robustness. To tackle the noisy OVAR task, we further propose one novel DENOISER framework, covering two parts: generation and discrimination. Concretely, the generative part denoises noisy class-text names via one decoding process, i.e., propose text candidates, then utilize inter-modal and intra-modal information to vote for the best. At the discriminative part, we use vanilla OVAR models to assign visual samples to class-text names, thus obtaining more semantics. For optimization, we alternately iterate between generative and discriminative parts for progressive refinements. The denoised text classes help OVAR models classify visual samples more accurately; in return, classified visual samples help better denoising. On three datasets, we carry out extensive experiments to show our superior robustness, and thorough ablations to dissect the effectiveness of each component.
作为计算机视觉中的一个基本视频任务,Open-Vocabulary Action Recognition (OVAR) 最近受到了越来越多的关注,随着视觉语言预训练的发展。为了实现任意类别的泛化,现有方法将类标签视为文本描述,然后将 OVAR 表示为评估视觉样本与文本类之间的嵌入相似度。然而,一个关键问题被完全忽视了:用户提供的类描述可能会有噪音,例如拼写和错别字,这限制了原典 OVAR 的现实应用。为了填补研究空白,本文通过模拟各种类型的多级噪音来评估现有方法,并揭示了它们的脆弱性。为了应对噪音 OVAR 任务,我们进一步提出了一个新颖的 DENOISER 框架,包括生成和分类两个部分。具体来说,生成部分通过一个解码过程消除噪音类-文本名称,即提出文本候选者,然后利用跨模态和内模态信息进行投票,获得最佳。在分类部分,我们使用原典 OVAR 模型将视觉样本分配到类文本名称,从而获得更多的语义信息。为了优化,我们交替迭代生成和分类部分进行逐步改进。消除的文本类有助于 OVAR 模型更准确地分类视觉样本;与此同时,分类的视觉样本有助于更好地消除噪音。在三个数据集上,我们进行了广泛的实验,以展示我们卓越的鲁棒性,并对每个组件的深入剖析进行了充分研究。
https://arxiv.org/abs/2404.14890
The integration of Foundation Models (FMs) with Federated Learning (FL) presents a transformative paradigm in Artificial Intelligence (AI), offering enhanced capabilities while addressing concerns of privacy, data decentralization, and computational efficiency. This paper provides a comprehensive survey of the emerging field of Federated Foundation Models (FedFM), elucidating their synergistic relationship and exploring novel methodologies, challenges, and future directions that the FL research field needs to focus on in order to thrive in the age of foundation models. A systematic multi-tiered taxonomy is proposed, categorizing existing FedFM approaches for model training, aggregation, trustworthiness, and incentivization. Key challenges, including how to enable FL to deal with high complexity of computational demands, privacy considerations, contribution evaluation, and communication efficiency, are thoroughly discussed. Moreover, the paper explores the intricate challenges of communication, scalability and security inherent in training/fine-tuning FMs via FL, highlighting the potential of quantum computing to revolutionize the training, inference, optimization and data encryption processes. This survey underscores the importance of further research to propel innovation in FedFM, emphasizing the need for developing trustworthy solutions. It serves as a foundational guide for researchers and practitioners interested in contributing to this interdisciplinary and rapidly advancing field.
将基础模型(FMs)与去中心化学习(FL)相结合,为人工智能(AI)领域提供了一种变革性的范式,同时解决了对隐私、数据去中心化和计算效率的担忧。本文对新兴领域Federated Foundation Models(FedFM)进行全面调查,阐述其协同关系,并探讨了FL研究领域需要关注的新方法、挑战和未来方向,以便在基础模型时代蓬勃发展。提出了一种多层分类体系,对现有的FedFM模型进行分类,包括模型训练、聚合、可信度和激励。详细讨论了如何通过FL处理计算需求的复杂性、隐私考虑、贡献评估和通信效率等关键挑战。此外,本文探讨了通过FL训练/微调FMs所面临的精细挑战,强调了量子计算在革新训练、推理、优化和数据加密过程中的潜在可能性。这项调查强调了对FedFM进一步研究以推动创新的必要性,需要开发可靠的解决方案。它为对此跨学科且快速发展的领域感兴趣的研究人员和实践者提供了一个基础指南。
https://arxiv.org/abs/2404.15381
Fine-tuning Pre-trained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing Parameter-Efficient Fine-Tuning techniques could potentially enhance the performance of PLMs. However, the direct transfer to life science tasks is non-trivial due to the different training strategies and data forms. To address this gap, we introduce SES-Adapter, a simple, efficient, and scalable adapter method for enhancing the representation learning of PLMs. SES-Adapter incorporates PLM embeddings with structural sequence embeddings to create structure-aware representations. We show that the proposed method is compatible with different PLM architectures and across diverse tasks. Extensive evaluations are conducted on 2 types of folding structures with notable quality differences, 9 state-of-the-art baselines, and 9 benchmark datasets across distinct downstream tasks. Results show that compared to vanilla PLMs, SES-Adapter improves downstream task performance by a maximum of 11% and an average of 3%, with significantly accelerated training speed by a maximum of 1034% and an average of 362%, the convergence rate is also improved by approximately 2 times. Moreover, positive optimization is observed even with low-quality predicted structures. The source code for SES-Adapter is available at this https URL.
预训练的蛋白质语言模型(PLMs)的微调被证明是一种增强下游预测任务的突出策略,往往比传统监督学习方法更优异。作为一种在自然语言处理中广泛应用的强大的技术,采用参数高效的微调方法可能会提高PLMs的性能。然而,由于训练策略和数据形式的不同,将微调应用于生物学任务并不是一件容易的事情。为了填补这一空白,我们引入了SES-Adapter,一种简单、高效、可扩展的适配器方法,用于增强PLMs的表示学习。SES-Adapter通过将PLM嵌入与结构序列嵌入相结合来创建结构意识表示。我们证明了所提出的方法可以兼容不同PLM架构,并在多样任务上取得良好的效果。在两个类型的折叠结构上进行了广泛的评估,包括显著的质量差异的两种PLM架构、9个最先进的基准和9个基准数据集。结果表明,与普通PLM相比,SES-Adapter通过提高下游任务性能最多11%,平均3%,以及通过最大1034%和平均362%的加速训练速度,显著改善了训练速度。此外,即使在低质量的预测结构上,也观察到了积极的优化。SES-Adapter的源代码可在此处访问:https://url.com/
https://arxiv.org/abs/2404.14850
Reconstructing digital brain phantoms in the form of multi-channeled brain tissue probability maps for individual subjects is essential for capturing brain anatomical variability, understanding neurological diseases, as well as for testing image processing methods. We demonstrate the first framework that optimizes brain tissue probability maps (Gray Matter - GM, White Matter - WM, and Cerebrospinal fluid - CSF) with the help of a Physics-based differentiable MRI simulator that models the magnetization signal at each voxel in the image. Given an observed $T_1$/$T_2$-weighted MRI scan, the corresponding clinical MRI sequence, and the MRI differentiable simulator, we optimize the simulator's input probability maps by back-propagating the L2 loss between the simulator's output and the $T_1$/$T_2$-weighted scan. This approach has the significant advantage of not relying on any training data, and instead uses the strong inductive bias of the MRI simulator. We tested the model on 20 scans from the BrainWeb database and demonstrate a highly accurate reconstruction of GM, WM, and CSF.
重建数字脑图在个体 subjects 形式为多通道脑组织概率图对于捕捉脑解剖变异、理解神经系统疾病以及测试图像处理方法来说至关重要。我们证明了第一个利用基于物理的差分 MRI 模拟器优化脑组织概率图(灰质 - GM,白质 - WM 和蛛网膜腔 - CSF)的框架。在观察到 $T_1$/$T_2$ 加权 MRI 扫描、相应临床 MRI 序列和 MRI 差分模拟器的基础上,我们通过反向传播模拟器输出与 $T_1$/$T_2$ 加权扫描之间的 L2 损失来优化模拟器的输入概率图。这种方法具有不依赖于任何训练数据的优势,而是利用了 MRI 模拟器的强大归纳偏差。我们在 BrainWeb 数据库的 20 个扫描上测试了该模型,并证明了 GM、WM 和 CSF 的重建高度准确。
https://arxiv.org/abs/2404.14739
The execution of flight missions by unmanned aerial vehicles (UAV) primarily relies on navigation. In particular, the navigation pipeline has traditionally been divided into positioning and control, operating in a sequential loop. However, the existing navigation pipeline, where the positioning and control are decoupled, struggles to adapt to ubiquitous uncertainties arising from measurement noise, abrupt disturbances, and nonlinear dynamics. As a result, the navigation reliability of the UAV is significantly challenged in complex dynamic areas. For example, the ubiquitous global navigation satellite system (GNSS) positioning can be degraded by the signal reflections from surrounding high-rising buildings in complex urban areas, leading to significantly increased positioning uncertainty. An additional challenge is introduced to the control algorithm due to the complex wind disturbances in urban canyons. Given the fact that the system positioning and control are highly correlated with each other, this research proposes a **tightly joined positioning and control model (JPCM) based on factor graph optimization (FGO)**. In particular, the proposed JPCM combines sensor measurements from positioning and control constraints into a unified probabilistic factor graph. Specifically, the positioning measurements are formulated as the factors in the factor graph. In addition, the model predictive control (MPC) is also formulated as the additional factors in the factor graph. By solving the factor graph contributed by both the positioning-related factors and the MPC-based factors, the complementariness of positioning and control can be deeply exploited. Finally, we validate the effectiveness and resilience of the proposed method using a simulated quadrotor system which shows significantly improved trajectory following performance.
无人机(UAV)执行任务的主要依赖是导航。特别是,传统的导航管道被分为定位和控制,在顺序循环中运行。然而,由于测量噪声、突然干扰和非线性动力学等原因,现有的导航管道在复杂动态区域中面临着严重的导航可靠性挑战。例如,复杂城市区域周围高耸建筑的信号反射可能会降低全球导航卫星系统(GNSS)的定位精度,导致定位不确定性大幅增加。此外,城市峡谷中的复杂风干扰给控制算法带来了额外的挑战。鉴于系统定位和控制高度相关,这项研究基于因子图优化(FGO)提出了一个**紧密连接的定位和控制模型(JPCM)**。 具体来说,与定位和控制相关的传感器测量被统一到一个概率因子图上。具体而言,定位测量被表示为因子图中的因子。此外,模型预测控制(MPC)也被表示为因子图中的其他因子。通过解决定位相关因素和基于MPC的因子图中的因素,可以深入挖掘定位和控制的互补性。最后,我们通过模拟四旋翼系统来验证所提出方法的有效性和韧性,该系统显示出明显改善的轨迹跟随性能。
https://arxiv.org/abs/2404.14724
Large Language Models (LLMs) have demonstrated remarkable performance across a spectrum of tasks. Recently, Direct Preference Optimization (DPO) has emerged as an RL-free approach to optimize the policy model on human preferences. However, several limitations hinder the widespread adoption of this method. To address these shortcomings, various versions of DPO have been introduced. Yet, a comprehensive evaluation of these variants across diverse tasks is still lacking. In this study, we aim to bridge this gap by investigating the performance of alignment methods across three distinct scenarios: (1) keeping the Supervised Fine-Tuning (SFT) part, (2) skipping the SFT part, and (3) skipping the SFT part and utilizing an instruction-tuned model. Furthermore, we explore the impact of different training sizes on their performance. Our evaluation spans a range of tasks including dialogue systems, reasoning, mathematical problem-solving, question answering, truthfulness, and multi-task understanding, encompassing 13 benchmarks such as MT-Bench, Big Bench, and Open LLM Leaderboard. Key observations reveal that alignment methods achieve optimal performance with smaller training data subsets, exhibit limited effectiveness in reasoning tasks yet significantly impact mathematical problem-solving, and employing an instruction-tuned model notably influences truthfulness. We anticipate that our findings will catalyze further research aimed at developing more robust models to address alignment challenges.
大语言模型(LLMs)在各种任务上都表现出了令人印象深刻的性能。最近,直接偏好优化(DPO)作为一种无需使用强化学习(RL)的方法,优化了关于人类偏好的策略模型,成为一个备受关注的研究方向。然而,几种限制性的限制使得这种方法在广泛应用上受到了阻碍。为了应对这些缺陷,各种版本的DPO应运而生。然而,对这些变体的全面评估在各种任务上的表现仍然是缺乏的。在这项研究中,我们通过研究在不同场景下对对齐方法的表现,试图弥补这一空白。我们研究了三种不同场景下的对齐方法:(1)保留监督微调(SFT)部分,(2)跳过SFT部分,(3)跳过SFT部分并利用指令微调模型。此外,我们还研究了不同训练规模对它们性能的影响。我们的评估范围包括对话系统、推理、数学问题求解、问答、真理fulness和多任务理解,涵盖了包括MT-Bench、Big Bench和Open LLM Leaderboard在内的13个基准。关键观察表明,对齐方法在较小的训练数据子集上实现最优性能,在推理任务上的效果有限,但在数学问题求解上具有显著影响,而利用指令微调模型在很大程度上影响了真理fulness。我们预计,我们的研究将催化进一步研究,以开发更健壮的模型来解决对齐挑战。
https://arxiv.org/abs/2404.14723
Testing and evaluating the safety performance of autonomous vehicles (AVs) is essential before the large-scale deployment. Practically, the number of testing scenarios permissible for a specific AV is severely limited by tight constraints on testing budgets and time. With the restrictions imposed by strictly restricted numbers of tests, existing testing methods often lead to significant uncertainty or difficulty to quantifying evaluation results. In this paper, we formulate this problem for the first time the "few-shot testing" (FST) problem and propose a systematic framework to address this challenge. To alleviate the considerable uncertainty inherent in a small testing scenario set, we frame the FST problem as an optimization problem and search for the testing scenario set based on neighborhood coverage and similarity. Specifically, under the guidance of better generalization ability of the testing scenario set on AVs, we dynamically adjust this set and the contribution of each testing scenario to the evaluation result based on coverage, leveraging the prior information of surrogate models (SMs). With certain hypotheses on SMs, a theoretical upper bound of evaluation error is established to verify the sufficiency of evaluation accuracy within the given limited number of tests. The experiment results on cut-in scenarios demonstrate a notable reduction in evaluation error and variance of our method compared to conventional testing methods, especially for situations with a strict limit on the number of scenarios.
在自动驾驶车辆(AVs)的大规模部署之前,对AV的安全性能进行测试和评估是至关重要的。实际上,特定AV允许的测试场景数量受到严格预算和时间限制的严重限制。由于限制了测试预算和时间,现有测试方法通常导致对评估结果的不确定性和量化评估结果的困难。在本文中,我们首次将这个问题定义为“少样本测试”(FST)问题,并提出了一个系统框架来解决这个挑战。为了减轻小测试场景集中存在的相当大的不确定性,我们将FST问题定义为优化问题,并基于邻域覆盖和相似性搜索测试场景集。具体来说,在AV测试场景集的更好泛化能力指导下,我们动态调整该集,并根据覆盖率基于每个测试场景对评估结果的贡献进行调整,利用代理模型(SMs)的先验信息。在某些假设关于SMs的情况下,建立了评估误差的理论上限,以验证在给定的有限测试数量内评估准确性的充分性。对切分场景的实验结果表明,与传统测试方法相比,我们的方法在评估误差和方差方面具有显著的减少,尤其是在有限场景数量的情况下。
https://arxiv.org/abs/2402.01795
The wide deployment of Face Recognition (FR) systems poses risks of privacy leakage. One countermeasure to address this issue is adversarial attacks, which deceive malicious FR searches but simultaneously interfere the normal identity verification of trusted authorizers. In this paper, we propose the first Double Privacy Guard (DPG) scheme based on traceable adversarial watermarking. DPG employs a one-time watermark embedding to deceive unauthorized FR models and allows authorizers to perform identity verification by extracting the watermark. Specifically, we propose an information-guided adversarial attack against FR models. The encoder embeds an identity-specific watermark into the deep feature space of the carrier, guiding recognizable features of the image to deviate from the source identity. We further adopt a collaborative meta-optimization strategy compatible with sub-tasks, which regularizes the joint optimization direction of the encoder and decoder. This strategy enhances the representation of universal carrier features, mitigating multi-objective optimization conflicts in watermarking. Experiments confirm that DPG achieves significant attack success rates and traceability accuracy on state-of-the-art FR models, exhibiting remarkable robustness that outperforms the existing privacy protection methods using adversarial attacks and deep watermarking, or simple combinations of the two. Our work potentially opens up new insights into proactive protection for FR privacy.
广泛部署人脸识别(FR)系统会带来隐私泄露的风险。解决这个问题的一个对策是对抗性攻击,这种攻击会欺骗恶意的人脸识别,但同时会干扰可信授权者的正常身份验证。在本文中,我们提出了基于可追踪的对抗性水印的第一个双隐私保护(DPG)方案。DPG采用一次性水印嵌入来欺骗未经授权的人脸识别模型,并允许授权者通过提取水印来验证身份。具体来说,我们针对FR模型提出了信息指导的对抗性攻击。编码器将身份特定的水印嵌入到载体的深度特征空间中,引导图像的可识别特征远离源身份。我们进一步采用了一种可互补的元优化策略,该策略与子任务兼容,规范了编码器和解码器的联合优化方向。这种策略提高了普遍载荷特征的代表性,减轻了水印标记中的多目标优化冲突。实验证实,DPG在最先进的FR模型上实现了显著的攻击成功率和可追溯准确性,表现出出色的稳健性,超过使用对抗攻击和深度水印的现有隐私保护方法,或者使用简单的水印和编码器组合。我们的工作可能会为FR隐私的主动保护提供新的见解。
https://arxiv.org/abs/2404.14693
In multi-robot systems, achieving coordinated missions remains a significant challenge due to the coupled nature of coordination behaviors and the lack of global information for individual robots. To mitigate these challenges, this paper introduces a novel approach, Bi-level Coordination Learning (Bi-CL), that leverages a bi-level optimization structure within a centralized training and decentralized execution paradigm. Our bi-level reformulation decomposes the original problem into a reinforcement learning level with reduced action space, and an imitation learning level that gains demonstrations from a global optimizer. Both levels contribute to improved learning efficiency and scalability. We note that robots' incomplete information leads to mismatches between the two levels of learning models. To address this, Bi-CL further integrates an alignment penalty mechanism, aiming to minimize the discrepancy between the two levels without degrading their training efficiency. We introduce a running example to conceptualize the problem formulation and apply Bi-CL to two variations of this example: route-based and graph-based scenarios. Simulation results demonstrate that Bi-CL can learn more efficiently and achieve comparable performance with traditional multi-agent reinforcement learning baselines for multi-robot coordination.
在多机器人系统中,实现协调任务仍然是一个重要的挑战,因为协调行为是相互耦合的,并且每个机器人的全局信息缺乏。为了减轻这些挑战,本文引入了一种新颖的方法——双层协调学习(Bi-CL),该方法利用了集中训练和分布式执行范式中的中央化优化结构。我们的双层归约将原始问题分解为强化学习级别具有减小动作空间的级别和基于全局最优器的模仿学习级别。这两个级别都促进了学习效率和可扩展性的提高。我们注意到,机器人的不完全信息导致了学习模型的两个级别之间的差异。为了应对这个问题,Bi-CL进一步引入了平滑惩罚机制,旨在最小化两个级别之间的差异,同时不降低它们的训练效率。我们引入了一个示例来阐述问题求解方法和应用Bi-CL到两种变体:基于路线和基于图的场景。仿真结果表明,Bi-CL可以学习更有效地,与传统的多机器人协同强化学习基线具有可比较的性能。
https://arxiv.org/abs/2404.14649
Controller tuning and parameter optimization are crucial in system design to improve closed-loop system performance. Bayesian optimization has been established as an efficient model-free controller tuning and adaptation method. However, Bayesian optimization methods are computationally expensive and therefore difficult to use in real-time critical scenarios. In this work, we propose a real-time purely data-driven, model-free approach for adaptive control, by online tuning low-level controller parameters. We base our algorithm on GoOSE, an algorithm for safe and sample-efficient Bayesian optimization, for handling performance and stability criteria. We introduce multiple computational and algorithmic modifications for computational efficiency and parallelization of optimization steps. We further evaluate the algorithm's performance on a real precision-motion system utilized in semiconductor industry applications by modifying the payload and reference stepsize and comparing it to an interpolated constrained optimization-based baseline approach.
控制器调整和参数优化在系统设计中至关重要,以提高闭环系统的性能。贝叶斯优化被证明是一种有效的模型无关控制器调整和适应方法。然而,贝叶斯优化方法在实时关键场景中计算代价较高,因此难以使用。在这项工作中,我们提出了一种实时的数据驱动、模型无关的自适应控制方法,通过在线调整低级控制器参数来实现。我们基于GoOSE算法,该算法用于安全且具有采样效率的贝叶斯优化,处理性能和稳定性标准。我们引入了多个计算和算法修改以提高计算效率和优化步骤的并行化。我们进一步通过修改负载和参考步长并将其与拟合约束优化基于基准方法进行比较,评估算法的性能在用于半导体工业应用的实时精度运动系统上。
https://arxiv.org/abs/2404.14602
This paper introduces PAG-a novel optimization and decoding approach that guides autoregressive generation of document identifiers in generative retrieval models through simultaneous decoding. To this aim, PAG constructs a set-based and sequential identifier for each document. Motivated by the bag-of-words assumption in information retrieval, the set-based identifier is built on lexical tokens. The sequential identifier, on the other hand, is obtained via quantizing relevance-based representations of documents. Extensive experiments on MSMARCO and TREC Deep Learning Track data reveal that PAG outperforms the state-of-the-art generative retrieval model by a large margin (e.g., 15.6% MRR improvements on MS MARCO), while achieving 22x speed up in terms of query latency.
本文介绍了一种名为PAG的新优化和解码方法,该方法通过同时解码在生成检索模型的自回归过程中指导文档标识符的生成。为实现这一目标,PAG构建了一个基于集合的文档标识符。受到信息检索中语料库假设的启发,基于词汇的标识符是通过量化相关性表示的文档。另一方面,顺序标识符则是通过量化文档的相关性表示获得的。在MSMARCO和TREC Deep Learning Track数据集的广泛实验中,PAG在生成检索模型的最先进实现方面显著超过了现有水平(例如,在MS MARCO数据集上提高了15.6%的MRR),同时将查询延迟速度提高了22倍。
https://arxiv.org/abs/2404.14600
Artificial intelligence (AI) and neuroscience share a rich history, with advancements in neuroscience shaping the development of AI systems capable of human-like knowledge retention. Leveraging insights from neuroscience and existing research in adversarial and continual learning, we introduce a novel framework comprising two core concepts: feature distillation and re-consolidation. Our framework, named Robust Rehearsal, addresses the challenge of catastrophic forgetting inherent in continual learning (CL) systems by distilling and rehearsing robust features. Inspired by the mammalian brain's memory consolidation process, Robust Rehearsal aims to emulate the rehearsal of distilled experiences during learning tasks. Additionally, it mimics memory re-consolidation, where new experiences influence the integration of past experiences to mitigate forgetting. Extensive experiments conducted on CIFAR10, CIFAR100, and real-world helicopter attitude datasets showcase the superior performance of CL models trained with Robust Rehearsal compared to baseline methods. Furthermore, examining different optimization training objectives-joint, continual, and adversarial learning-we highlight the crucial role of feature learning in model performance. This underscores the significance of rehearsing CL-robust samples in mitigating catastrophic forgetting. In conclusion, aligning CL approaches with neuroscience insights offers promising solutions to the challenge of catastrophic forgetting, paving the way for more robust and human-like AI systems.
人工智能(AI)和神经科学之间有一个丰富的历史,神经科学的进步塑造了能够像人类一样保留知识的人工智能系统的发展。利用神经科学和现有对抗学习和持续学习的研究,我们引入了一个新框架,包括两个核心概念:特征提取和重新巩固。我们的框架名为Robust Rehearsal,通过提取和演练 robust 特征来解决连续学习(CL)系统中的灾难性遗忘挑战。灵感来自哺乳动物大脑的记忆巩固过程,Robust Rehearsal 旨在模仿在学习任务中的提取经历的演练。此外,它还模仿了记忆重新巩固,即新的经验通过整合过去的经验来缓解遗忘。在CIFAR10、CIFAR100和现实直升机的姿态数据集上进行的大量实验展示了使用Robust Rehearsal训练的CL模型的优越性能,相对于基线方法。此外,通过研究不同优化训练目标——联合、持续和对抗学习——我们强调了特征学习在模型性能中的关键作用。这使得在减轻灾难性遗忘方面演习 CL-robust 样本具有重要意义。总之,将AI方法与神经科学见解相结合为解决灾难性遗忘提供了解决方案,为更加健壮和像人类一样的AI系统铺平道路。
https://arxiv.org/abs/2404.14588
Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks. Sampling from DMs can be seen as solving a differential equation through a discretized set of noise levels known as the sampling schedule. While past works primarily focused on deriving efficient solvers, little attention has been given to finding optimal sampling schedules, and the entire literature relies on hand-crafted heuristics. In this work, for the first time, we propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs, called $\textit{Align Your Steps}$. We leverage methods from stochastic calculus and find optimal schedules specific to different solvers, trained DMs and datasets. We evaluate our novel approach on several image, video as well as 2D toy data synthesis benchmarks, using a variety of different samplers, and observe that our optimized schedules outperform previous hand-crafted schedules in almost all experiments. Our method demonstrates the untapped potential of sampling schedule optimization, especially in the few-step synthesis regime.
扩散模型(DMs)在视觉领域及 beyond 已经成为了最先进的生成建模方法。DM 的一个关键缺点是它们的缓慢采样速度,依赖许多大神经网络的连续函数评估。从 DM 采样可以看作是通过称为采样计划的离散化噪声水平解决微分方程。虽然过去的论文主要关注推导高效的求解方法,但很少关注找到最优采样计划,整个文献都依赖人为手动的启发式。在本文中,我们首次提出了一种通用的且原则的优化 DM 采样计划的算法,名为“Align Your Steps”。我们利用随机微积分的方法,找到了针对不同求解器、训练中的 DM 和数据集的最佳采样计划。我们在多个图像、视频以及 2D 玩具数据合成基准上评估了我们新算法的性能,使用各种不同的采样器,观察到我们的优化计划几乎在所有实验中超过了之前的自定义优化计划。我们的方法展示了采样计划优化的未发掘潜力,尤其是在少数步骤的合成领域。
https://arxiv.org/abs/2404.14507
Stable locomotion in precipitous environments is an essential capability of quadruped robots, demanding the ability to resist various external disturbances. However, recent learning-based policies only use basic domain randomization to improve the robustness of learned policies, which cannot guarantee that the robot has adequate disturbance resistance capabilities. In this paper, we propose to model the learning process as an adversarial interaction between the actor and a newly introduced disturber and ensure their optimization with $H_{\infty}$ constraint. In contrast to the actor that maximizes the discounted overall reward, the disturber is responsible for generating effective external forces and is optimized by maximizing the error between the task reward and its oracle, i.e., "cost" in each iteration. To keep joint optimization between the actor and the disturber stable, our $H_{\infty}$ constraint mandates the bound of ratio between the cost to the intensity of the external forces. Through reciprocal interaction throughout the training phase, the actor can acquire the capability to navigate increasingly complex physical disturbances. We verify the robustness of our approach on quadrupedal locomotion tasks with Unitree Aliengo robot, and also a more challenging task with Unitree A1 robot, where the quadruped is expected to perform locomotion merely on its hind legs as if it is a bipedal robot. The simulated quantitative results show improvement against baselines, demonstrating the effectiveness of the method and each design choice. On the other hand, real-robot experiments qualitatively exhibit how robust the policy is when interfering with various disturbances on various terrains, including stairs, high platforms, slopes, and slippery terrains. All code, checkpoints, and real-world deployment guidance will be made public.
在险峻环境中实现稳定的运动是四足机器人的关键能力,要求具有抵抗各种外部干扰的能力。然而,最近基于学习的策略仅使用基本的领域随机化来提高学到的策略的鲁棒性,这不能保证机器人具有足够的干扰抵抗能力。在本文中,我们将建模学习过程为演员与一个新引入的干扰器之间的对抗交互,并通过$H_{\infty}$约束确保它们的优化。与最大化累计奖励的演员不同,干扰器负责生成有效的外部力,并通过最大化任务奖励与其预言之间的误差来优化,即“成本”在每个迭代中。为了保持演员和干扰器之间的联合优化稳定,我们的$H_{\infty}$约束要求外力成本与强度之间的比值的边界。在训练过程中通过相互交互,演员可以获得在 increasingly复杂的物理干扰中航行的能力。我们在 Unitree Aliengo 机器人上验证了我们的方法的有效性,还使用 Unitree A1 机器人进行了一个更具有挑战性的任务,其中假设四足机器人仅在腿上进行运动,就像它是一个双足机器人一样。模拟的定量结果表明,相对于基线,我们的方法取得了改善,证明了这种方法和每个设计选择的有效性。另一方面,实机实验通过干扰各种地形对策略的鲁棒性进行了定性评估。所有代码、检查点和实机部署指南都将公开发布。
https://arxiv.org/abs/2404.14405