Recognizing the prevalence of domain shift as a common challenge in machine learning, various domain generalization (DG) techniques have been developed to enhance the performance of machine learning systems when dealing with out-of-distribution (OOD) data. Furthermore, in real-world scenarios, data distributions can gradually change across a sequence of sequential domains. While current methodologies primarily focus on improving model effectiveness within these new domains, they often overlook fairness issues throughout the learning process. In response, we introduce an innovative framework called Counterfactual Fairness-Aware Domain Generalization with Sequential Autoencoder (CDSAE). This approach effectively separates environmental information and sensitive attributes from the embedded representation of classification features. This concurrent separation not only greatly improves model generalization across diverse and unfamiliar domains but also effectively addresses challenges related to unfair classification. Our strategy is rooted in the principles of causal inference to tackle these dual issues. To examine the intricate relationship between semantic information, sensitive attributes, and environmental cues, we systematically categorize exogenous uncertainty factors into four latent variables: 1) semantic information influenced by sensitive attributes, 2) semantic information unaffected by sensitive attributes, 3) environmental cues influenced by sensitive attributes, and 4) environmental cues unaffected by sensitive attributes. By incorporating fairness regularization, we exclusively employ semantic information for classification purposes. Empirical validation on synthetic and real-world datasets substantiates the effectiveness of our approach, demonstrating improved accuracy levels while ensuring the preservation of fairness in the evolving landscape of continuous domains.
认识到域转换是机器学习中常见的挑战,各种域扩展技术(DG)已经被开发用于提高处理非分布数据(OOD)机器学习系统的性能。此外,在现实世界场景中,数据分布可以逐步变化在一个连续的域序列中。虽然当前的方法主要关注在这些新域中提高模型有效性,但它们往往在整个学习过程中忽视公平问题。为了应对这种情况,我们提出了一种名为“反事实公平 aware 域扩展”的创新框架(CDSAE)。该方法有效地将环境信息和敏感属性从分类特征嵌入表示中分离出来。这种同时分离不仅极大地改善了跨不同熟悉域模型的泛化能力,而且还有效地解决了与不公平分类相关的挑战。我们的策略基于因果关系推理的原则,以解决这些双重问题。为了研究语义信息、敏感属性和环境 cues之间的关系,我们 systematic 地将外部不确定性因素分类为四个隐变量:1)受敏感属性影响的语义信息,2)不受敏感属性影响的语义信息,3)受敏感属性影响的环境问题,4)不受敏感属性影响的环境问题。通过引入公平 regularization,我们仅用于分类目的的语义信息。对合成数据和实际数据集的模拟验证证实了我们方法的有效性,证明了提高准确性水平,同时确保了连续域演化 landscape 中公平性的保持。
https://arxiv.org/abs/2309.13005
Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe. Tomitigate this problem, we propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation. Rather than regarding all the agents as a high dimensional single one and directly applying single agent methods to it, CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation. We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number, which is therefore theoretically superior to the direct treatment referred to above, especially when the agent number is large. We further conduct experiments on four environments including both discrete and continuous action settings on both existing and our man-made datasets, demonstrating that CFCQL outperforms existing methods on most datasets and even with a remarkable margin on some of them.
offline multi-agent reinforcement learning 面临挑战,因为 offline 环境通常存在分布 shift 问题和多Agent 环境常见的高维度问题,导致行动偏离分布(OOD)和价值高估现象过于严重。为了解决这一问题,我们提出了一种 novel 的 multi-agent offline RL 算法,名为 CounterFactual Conservative Q-Learning(CFCQL),用于保守价值估计。我们不再是将所有 agents 视为高维 single 个体并直接应用单个体方法,而是采用 counterfactual 方法分别计算每个agent 的保守Regularization,然后线性组合它们来实现整体保守价值估计。我们证明了它仍然具有低估特性和表现保证,就像单个体保守方法一样,但引起的Regularization 和安全政策改进限与agent 数量无关,因此理论上比上述直接处理方法更好,特别是当agent 数量很大时。我们还对现有的环境和两个现有和我们的人工数据集分别进行了实验,证明了 CFCQL 在大多数数据集上优于现有的方法,甚至在其中的某些数据集上表现出显著的差异。
https://arxiv.org/abs/2309.12696
The use of Implicit Neural Representation (INR) through a hash-table has demonstrated impressive effectiveness and efficiency in characterizing intricate signals. However, current state-of-the-art methods exhibit insufficient regularization, often yielding unreliable and noisy results during interpolations. We find that this issue stems from broken gradient flow between input coordinates and indexed hash-keys, where the chain rule attempts to model discrete hash-keys, rather than the continuous coordinates. To tackle this concern, we introduce RHINO, in which a continuous analytical function is incorporated to facilitate regularization by connecting the input coordinate and the network additionally without modifying the architecture of current hash-based INRs. This connection ensures a seamless backpropagation of gradients from the network's output back to the input coordinates, thereby enhancing regularization. Our experimental results not only showcase the broadened regularization capability across different hash-based INRs like DINER and Instant NGP, but also across a variety of tasks such as image fitting, representation of signed distance functions, and optimization of 5D static / 6D dynamic neural radiance fields. Notably, RHINO outperforms current state-of-the-art techniques in both quality and speed, affirming its superiority.
通过哈希表使用隐含神经网络表示(INR)已经表现出令人印象深刻的效率和效力,以形容复杂的信号。然而,当前最先进的方法表现出不足的Regularization,在插值过程中往往产生不可靠和噪声性的结果。我们发现这个问题源于输入坐标和索引哈希键之间的梯度流中断,其中链式规则试图模拟离散哈希键,而不是连续的坐标。为了解决这个问题,我们引入了 RHINO,其中引入连续分析函数以促进Regularization,通过额外连接输入坐标和网络,而无需改变当前基于哈希的INR架构。这个连接确保了从网络的输出无缝反向传播梯度,从而增强Regularization。我们的实验结果不仅展示了不同基于哈希的INR之间的 broaden Regularization能力,如Diner和Instant NGP,而且还涵盖了各种任务,如图像匹配、表示 signed 距离函数和优化5D静态/6D动态神经网络光照场。值得注意的是, RHINO在质量和速度方面都超越了当前最先进的技术,确认了其优越性。
https://arxiv.org/abs/2309.12642
Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain. In this paper, we propose a novel approach for domain generalization that leverages recent advances in large vision-language models, specifically a CLIP teacher model, to train a smaller model that generalizes to unseen domains. The key technical contribution is a new type of regularization that requires the student's learned image representations to be close to the teacher's learned text representations obtained from encoding the corresponding text descriptions of images. We introduce two designs of the loss function, absolute and relative distance, which provide specific guidance on how the training process of the student model should be regularized. We evaluate our proposed method, dubbed RISE (Regularized Invariance with Semantic Embeddings), on various benchmark datasets and show that it outperforms several state-of-the-art domain generalization methods. To our knowledge, our work is the first to leverage knowledge distillation using a large vision-language model for domain generalization. By incorporating text-based information, RISE improves the generalization capability of machine learning models.
域泛化研究的问题是训练从一个多个域(或分布)中收集样本的模型,然后使用从一个未知的新域中收集样本的模型进行测试。在本文中,我们提出了一种域泛化的新方法,利用大型视觉语言模型的最新进展,特别是Clip teacher模型,训练一种小型模型,使其能够泛化到未知的域。关键技术贡献是一种新类型的正则化,它要求学生 learned 的图像表示接近从图像对应的文本描述编码中得到的 teacher 的文本表示。我们介绍了两种 loss 函数的设计,即绝对距离和相对距离,提供了具体指导,如何对学生模型的训练过程正则化。我们评估了我们提出的新方法,称为rise(正则化语义嵌入),在各种基准数据集上进行评估,并表明它比一些最先进的域泛化方法表现更好。据我们所知,我们的工作是使用大型视觉语言模型进行域泛化的第一个利用知识蒸馏的方法。通过引入文本信息,rise改善了机器学习模型的泛化能力。
https://arxiv.org/abs/2309.12530
The rapid development of 3D object detection systems for self-driving cars has significantly improved accuracy. However, these systems struggle to generalize across diverse driving environments, which can lead to safety-critical failures in detecting traffic participants. To address this, we propose a method that utilizes unlabeled repeated traversals of multiple locations to adapt object detectors to new driving environments. By incorporating statistics computed from repeated LiDAR scans, we guide the adaptation process effectively. Our approach enhances LiDAR-based detection models using spatial quantized historical features and introduces a lightweight regression head to leverage the statistics for feature regularization. Additionally, we leverage the statistics for a novel self-training process to stabilize the training. The framework is detector model-agnostic and experiments on real-world datasets demonstrate significant improvements, achieving up to a 20-point performance gain, especially in detecting pedestrians and distant objects. Code is available at this https URL.
自动驾驶汽车中的三维物体检测系统的迅速发展已经大大提高了精度。然而,这些系统却面临着在不同驾驶环境中通用化的挑战,这可能会导致在检测交通参与者时发生关键故障。为了解决这个问题,我们提出了一种方法,利用未标记的多次路径访问来适应新驾驶环境的物体检测器。通过将计算过的统计信息纳入多次LiDAR扫描中,我们可以有效地指导适应过程。我们的方法利用空间离散的历史特征增强LiDAR检测模型,并引入一种轻量级回归头,以利用统计来实现特征 Regularization。此外,我们利用统计来建立一种新的自我训练过程,以稳定训练。框架是检测模型无关的,在真实世界数据集上的实验表明取得了显著的改进,实现了高达20点的性能提升,特别是检测行人和远处物体时。代码已在此httpsURL上提供。
https://arxiv.org/abs/2309.12140
Unsupervised domain adaptation (UDA) is an effective approach to handle the lack of annotations in the target domain for the semantic segmentation task. In this work, we consider a more practical UDA setting where the target domain contains sequential frames of the unlabeled videos which are easy to collect in practice. A recent study suggests self-supervised learning of the object motion from unlabeled videos with geometric constraints. We design a motion-guided domain adaptive semantic segmentation framework (MoDA), that utilizes self-supervised object motion to learn effective representations in the target domain. MoDA differs from previous methods that use temporal consistency regularization for the target domain frames. Instead, MoDA deals separately with the domain alignment on the foreground and background categories using different strategies. Specifically, MoDA contains foreground object discovery and foreground semantic mining to align the foreground domain gaps by taking the instance-level guidance from the object motion. Additionally, MoDA includes background adversarial training which contains a background category-specific discriminator to handle the background domain gaps. Experimental results on multiple benchmarks highlight the effectiveness of MoDA against existing approaches in the domain adaptive image segmentation and domain adaptive video segmentation. Moreover, MoDA is versatile and can be used in conjunction with existing state-of-the-art approaches to further improve performance.
无监督域适应(UDA)是一种有效的方法,用于处理目标域中缺乏标注数据的任务,即语义分割任务。在本文中,我们考虑一个更加实用的UDA场景,其中目标域包含未标注视频的连续帧,在实践中很容易收集。一项最近的研究建议从未标注视频进行自监督的物体运动学习。我们设计了一个基于运动指导的目标域自适应语义分割框架(MoDA),该框架利用自监督的物体运动学习在目标域中学习有效的表示。MoDA与之前的方法不同,使用目标域帧的时序一致性 regularization,而不同于以往的方法,MoDA采用不同的策略来处理前方和背景类别之间的域对齐。具体而言,MoDA包括前方物体发现和前方语义挖掘,通过从物体运动获取实例级指导,将前方域差距对齐。此外,MoDA还包括背景对抗训练,其中包含背景类别特定的分类器,以处理背景域差距。多个基准测试的实验结果强调了MoDA在域自适应图像分割和域自适应视频分割中的效果。此外,MoDA是灵活的,可以与现有的高级方法相结合,以进一步提高性能。
https://arxiv.org/abs/2309.11711
Cloth manipulation is a category of deformable object manipulation of great interest to the robotics community, from applications of automated laundry-folding and home organizing and cleaning to textiles and flexible manufacturing. Despite the desire for automated cloth manipulation, the thin-shell dynamics and under-actuation nature of cloth present significant challenges for robots to effectively interact with them. Many recent works omit explicit modeling in favor of learning-based methods that may yield control policies directly. However, these methods require large training sets that must be collected and curated. In this regard, we create a framework for differentiable modeling of cloth dynamics leveraging an Extended Position-based Dynamics (XPBD) algorithm. Together with the desired control objective, physics-aware regularization terms are designed for better results, including trajectory smoothness and elastic potential energy. In addition, safety constraints, such as avoiding obstacles, can be specified using signed distance functions (SDFs). We formulate the cloth manipulation task with safety constraints as a constrained optimization problem, which can be effectively solved by mainstream gradient-based optimizers thanks to the end-to-end differentiability of our framework. Finally, we assess the proposed framework for manipulation tasks with various safety thresholds and demonstrate the feasibility of result trajectories on a surgical robot. The effects of the regularization terms are analyzed in an additional ablation study.
服装操纵是机器人社区非常感兴趣的可变形物体操纵类别,涵盖了自动化洗衣服和家庭组织和清洁、纺织品和灵活制造等应用。尽管人们渴望实现自动化的服装操纵,但服装的薄壳动态和缺乏作用性的特性给机器人与他们有效地互动带来了巨大的挑战。许多最近的工作忽略了显式建模,而是选择基于学习的方法,这些方法可能会直接产生控制政策。但是,这些方法需要大型训练集,必须收集和整理。在这方面,我们创造了一个框架,利用扩展的位置动态学算法(XPBD)来建模服装的动态性。与想要控制的控制目标一起,我们设计了注重物理知识的正规化项,以获得更好的结果,包括轨迹平滑和弹性势能。此外,安全限制,如避免障碍物,可以使用符号距离函数(SDF)进行指定。我们将安全限制的安全操纵任务转化为一个约束优化问题,该问题可以利用主流梯度优化器有效地解决。最后,我们评估了提出的框架,以各种安全阈值下的服装操纵任务,并展示了在手术机器人上的结果轨迹可行性。正规化项的影响在额外的燃烧研究中进行了分析。
https://arxiv.org/abs/2309.11655
Unsupervised multi-view representation learning has been extensively studied for mining multi-view data. However, some critical challenges remain. On the one hand, the existing methods cannot explore multi-view data comprehensively since they usually learn a common representation between views, given that multi-view data contains both the common information between views and the specific information within each view. On the other hand, to mine the nonlinear relationship between data, kernel or neural network methods are commonly used for multi-view representation learning. However, these methods are lacking in interpretability. To this end, this paper proposes a new multi-view fuzzy representation learning method based on the interpretable Takagi-Sugeno-Kang (TSK) fuzzy system (MVRL_FS). The method realizes multi-view representation learning from two aspects. First, multi-view data are transformed into a high-dimensional fuzzy feature space, while the common information between views and specific information of each view are explored simultaneously. Second, a new regularization method based on L_(2,1)-norm regression is proposed to mine the consistency information between views, while the geometric structure of the data is preserved through the Laplacian graph. Finally, extensive experiments on many benchmark multi-view datasets are conducted to validate the superiority of the proposed method.
未监督的多视图表示学习已经广泛研究了以挖掘多视图数据。然而,仍然有一些关键挑战。一方面,现有的方法不能全面探索多视图数据,因为它们通常学习视图之间的共同表示,因为多视图数据包含视图之间的共同信息和每个视图的特定信息。另一方面,为了挖掘数据之间的非线性关系,内核或神经网络方法通常用于多视图表示学习。然而,这些方法缺乏解释性。为此,本文提出了一种基于可解释的 Takagi-Sugeno-Kang (TSK) 模糊系统(MVRL_FS)的新多视图模糊表示学习方法。方法从两个方面实现多视图表示学习。第一,多视图数据转换为高维模糊特征空间,同时探索视图之间的共同信息和每个视图的特定信息。第二,一种新的正则化方法基于 L_(2,1)-范数回归,用于挖掘视图之间的一致性信息,同时通过拉普拉斯图保留数据的几何结构。最后,对许多基准多视图数据集进行了广泛的实验,以验证所提出方法的优越性。
https://arxiv.org/abs/2309.11473
Sensing and communication technologies have enhanced learning-based decision making methodologies for multi-agent systems such as connected autonomous vehicles (CAV). However, most existing safe reinforcement learning based methods assume accurate state information. It remains challenging to achieve safety requirement under state uncertainties for CAVs, considering the noisy sensor measurements and the vulnerability of communication channels. In this work, we propose a Robust Multi-Agent Proximal Policy Optimization with robust Safety Shield (SR-MAPPO) for CAVs in various driving scenarios. Both robust MARL algorithm and control barrier function (CBF)-based safety shield are used in our approach to cope with the perturbed or uncertain state inputs. The robust policy is trained with a worst-case Q function regularization module that pursues higher lower-bounded reward in the former, whereas the latter, i.e., the robust CBF safety shield accounts for CAVs' collision-free constraints in complicated driving scenarios with even perturbed vehicle state information. We validate the advantages of SR-MAPPO in robustness and safety and compare it with baselines under different driving and state perturbation scenarios in CARLA simulator. The SR-MAPPO policy is verified to maintain higher safety rates and efficiency (reward) when threatened by both state perturbations and unconnected vehicles' dangerous behaviors.
感知和通信技术已经增强了针对多Agent系统,如连接自主车辆(CAV)的基于学习的决策方法。然而,大多数现有安全基于 reinforcement learning 的方法都假设准确的状态信息。在考虑传感器测量噪声和通信通道的脆弱性时,为 CAV 在状态不确定的情况下实现安全要求仍然是一项挑战。在本文中,我们提出了一种 robust 的多Agent Proximal Policy Optimization 结合 robust Safety Shield(SR-MAPPO)方案,以在不同驾驶场景中的 CAV 中进行 robust 安全 Shield 的应用。我们使用了 robust MARL 算法和控制屏障函数(CBF)为基础的安全 Shield 来处理被扰动或不确定状态输入。 robust policy 使用一个最坏的情况 Q 函数Regularization 模块来训练,其中前者追求更高的低Bound 奖励,而后者,即 robust CBF 安全 Shield,在即使车辆状态信息被扰动的情况下,也能保证 CAV 的无碰撞限制约束。我们验证了SR-MAPPO在鲁棒性和安全性方面的优势和在 CARLA 模拟器中不同驾驶和状态扰动场景下的基准值的比较。SR-MAPPO 策略被验证能够在受到状态扰动和未连接车辆的危险行为威胁时保持更高的安全性率和效率(奖励)。
https://arxiv.org/abs/2309.11057
Accurately determining salient regions of an image is challenging when labeled data is scarce. DINO-based self-supervised approaches have recently leveraged meaningful image semantics captured by patch-wise features for locating foreground objects. Recent methods have also incorporated intuitive priors and demonstrated value in unsupervised methods for object partitioning. In this paper, we propose SEMPART, which jointly infers coarse and fine bi-partitions over an image's DINO-based semantic graph. Furthermore, SEMPART preserves fine boundary details using graph-driven regularization and successfully distills the coarse mask semantics into the fine mask. Our salient object detection and single object localization findings suggest that SEMPART produces high-quality masks rapidly without additional post-processing and benefits from co-optimizing the coarse and fine branches.
精确确定图像的关键区域在标记数据稀缺时是一项挑战性的任务。基于DINO的自我监督方法最近利用了 patch-wise 特征用于定位前景对象,从而利用了图像语义中有意义的信息。最近的方法和方法还考虑了直觉先验并证明了在对象分割方面的 unsupervised 方法的价值。在本文中,我们提出了SEMPART,它通过 graph-driven regularization Jointly infers the fine and coarse bi-partitions over an image's DINO-based semantic graph。此外,SEMPART 通过Graph 驱动的 Regularization 保留了 fine 边界细节,成功将粗面Mask 语义蒸馏到 fine 面。我们的关键物体检测和单个物体定位发现表明,SEMPART 可以快速生产高质量的掩膜,而无需额外的后处理,并从粗和 Fine 分支的共同优化中受益。
https://arxiv.org/abs/2309.10972
This work presents a new method for enhancing communication efficiency in stochastic Federated Learning that trains over-parameterized random networks. In this setting, a binary mask is optimized instead of the model weights, which are kept fixed. The mask characterizes a sparse sub-network that is able to generalize as good as a smaller target network. Importantly, sparse binary masks are exchanged rather than the floating point weights in traditional federated learning, reducing communication cost to at most 1 bit per parameter. We show that previous state of the art stochastic methods fail to find the sparse networks that can reduce the communication and storage overhead using consistent loss objectives. To address this, we propose adding a regularization term to local objectives that encourages sparser solutions by eliminating redundant features across sub-networks. Extensive experiments demonstrate significant improvements in communication and memory efficiency of up to five magnitudes compared to the literature, with minimal performance degradation in validation accuracy in some instances.
这项工作提出了一种新方法,用于提高StochasticFederated Learning中通信效率,该方法训练了参数过多的随机网络。在这个设定中,二进制掩码被优化,而不是模型权重,这些权重被保持固定。掩码描述了一个稀疏的子网络,其泛化能力不亚于较小的目标网络。重要的是,稀疏的二进制掩码在传统的Federated Learning中交换,而不是浮点权重,从而减少了通信成本,最多每个参数只使用1比特。我们证明,以前的先进随机方法无法找到通过一致性损失目标减少通信和存储 overhead的稀疏网络。为了解决这个问题,我们建议添加一个正则化项, local objectives中,以鼓励稀疏解决方案,通过消除子网络中的冗余特征。广泛的实验结果表明,与文献相比,通信和记忆效率有多达5倍的重大改进,在某些情况下,验证准确性几乎没有性能下降。
https://arxiv.org/abs/2309.10834
Music motif, as a conceptual building block of composition, is crucial for music structure analysis and automatic composition. While human listeners can identify motifs easily, existing computational models fall short in representing motifs and their developments. The reason is that the nature of motifs is implicit, and the diversity of motif variations extends beyond simple repetitions and modulations. In this study, we aim to learn the implicit relationship between motifs and their variations via representation learning, using the Siamese network architecture and a pretraining and fine-tuning pipeline. A regularization-based method, VICReg, is adopted for pretraining, while contrastive learning is used for fine-tuning. Experimental results on a retrieval-based task show that these two methods complement each other, yielding an improvement of 12.6% in the area under the precision-recall curve. Lastly, we visualize the acquired motif representations, offering an intuitive comprehension of the overall structure of a music piece. As far as we know, this work marks a noteworthy step forward in computational modeling of music motifs. We believe that this work lays the foundations for future applications of motifs in automatic music composition and music information retrieval.
音乐主题作为创作概念的基础组成部分,对于音乐结构分析和自动创作至关重要。虽然人类听众可以轻松地识别主题,但现有的计算模型在代表主题及其发展方面存在缺陷。原因在于主题的特性是隐含的,主题的变异多样性超越了简单的重复和调制。在本研究中,我们希望通过表示学习来学习主题及其变异之间的隐含关系,使用Siamese网络结构和一个预训练和微调流水线。我们采用Regularization-based方法进行预训练,同时采用比较学习进行微调。基于检索任务的实验结果显示,这两个方法相互补充,在精度记忆曲线下的面积中取得了12.6%的提高。最后,我们可视化了获得的主题表示,提供了对音乐作品整体结构的直觉理解。据我们所知,这项工作在计算建模音乐主题方面迈出了重要的一步。我们相信,这项工作为自动音乐创作和音乐信息检索的未来应用奠定了基础。
https://arxiv.org/abs/2309.10597
Catastrophic forgetting of previous knowledge is a critical issue in continual learning typically handled through various regularization strategies. However, existing methods struggle especially when several incremental steps are performed. In this paper, we extend our previous approach (RECALL) and tackle forgetting by exploiting unsupervised web-crawled data to retrieve examples of old classes from online databases. Differently from the original approach that did not perform any evaluation of the web data, here we introduce two novel approaches based on adversarial learning and adaptive thresholding to select from web data only samples strongly resembling the statistics of the no longer available training ones. Furthermore, we improved the pseudo-labeling scheme to achieve a more accurate labeling of web data that also consider classes being learned in the current step. Experimental results show that this enhanced approach achieves remarkable results, especially when multiple incremental learning steps are performed.
忘记之前的知识是持续学习中一个重要的问题,通常需要通过各种 Regularization 策略来处理。然而,现有的方法在执行多个增量步骤时往往面临挑战。在本文中,我们扩展了之前的方法(RECALL),并利用 unsupervised 的 Web 爬虫数据来解决忘记问题,从在线数据库中检索旧类的示例。与原始方法不同的是,我们引入了基于对抗学习和自适应阈值的两个新方法,从 Web 数据中仅选择 strongly 类似于不再可用的训练数据的统计样本。此外,我们改进了伪标签方案,以更准确地标签 Web 数据,并考虑在当前步骤中学习的新类。实验结果显示,这种增强方法取得了显著的结果,特别是在执行多个增量步骤时。
https://arxiv.org/abs/2309.10479
We propose a family of curvature-based regularization terms for deep generative model learning. Explicit coordinate-invariant formulas for both intrinsic and extrinsic curvature measures are derived for the case of arbitrary data manifolds embedded in higher-dimensional Euclidean space. Because computing the curvature is a highly computation-intensive process involving the evaluation of second-order derivatives, efficient formulas are derived for approximately evaluating intrinsic and extrinsic curvatures. Comparative studies are conducted that compare the relative efficacy of intrinsic versus extrinsic curvature-based regularization measures, as well as performance comparisons against existing autoencoder training methods. Experiments involving noisy motion capture data confirm that curvature-based methods outperform existing autoencoder regularization methods, with intrinsic curvature measures slightly more effective than extrinsic curvature measures.
我们提出了一种基于曲率的深度学习模型学习修饰符家族。对于将任意数据多边形嵌入更高维度欧氏空间的情况,我们推导了明确坐标不变的内积和外积曲率测量公式。由于计算曲率是涉及评估二阶导数的高计算密集型过程,我们推导了高效的公式来大致评估内积和外积曲率。进行了比较研究,比较了内积和外积曲率基于修饰符的相对有效性,以及与现有自动编码器训练方法的性能比较。与噪声运动捕捉数据相关的实验确认,基于曲率的方法比现有的自动编码器修饰方法表现更好,内积曲率测量比外积曲率测量更有效。
https://arxiv.org/abs/2309.10237
Two prevalent types of distributional shifts in machine learning are the covariate shift (as observed across different domains) and the semantic shift (as seen across different classes). Traditional OOD detection techniques typically address only one of these shifts. However, real-world testing environments often present a combination of both covariate and semantic shifts. In this study, we introduce a novel problem, semantic OOD detection across domains, which simultaneously addresses both distributional shifts. To this end, we introduce two regularization strategies: domain generalization regularization, which ensures semantic invariance across domains to counteract the covariate shift, and OOD detection regularization, designed to enhance OOD detection capabilities against the semantic shift through energy bounding. Through rigorous testing on three standard domain generalization benchmarks, our proposed framework showcases its superiority over conventional domain generalization approaches in terms of OOD detection performance. Moreover, it holds its ground by maintaining comparable InD classification accuracy.
机器学习中常见的分布偏移有两种:学指代词移动(在不同领域观察到的)和语义移动(在不同类别观察到的)。传统的OOD检测技术通常只处理一种分布偏移。然而,现实世界的测试环境常常同时存在学指代词和语义移动。在本研究中,我们提出了一种新问题:跨域语义OOD检测,可以同时解决分布偏移和学指代词移动。为此,我们介绍了两个正则化策略:跨域泛化正则化,通过确保跨域语义不变性来抵消学指代词移动,而OOD检测正则化则设计用于增强OOD检测对抗语义移动的能力。通过在三个标准跨域泛化基准上进行严格的测试,我们提出的框架展示了比传统跨域泛化方法在OOD检测性能方面的优越性。此外,通过保持相同的IDS分类精度,它保持了其优势。
https://arxiv.org/abs/2309.10209
Deep learning-based methods have been extensively explored for automatic building mapping from high-resolution remote sensing images over recent years. While most building mapping models produce vector polygons of buildings for geographic and mapping systems, dominant methods typically decompose polygonal building extraction in some sub-problems, including segmentation, polygonization, and regularization, leading to complex inference procedures, low accuracy, and poor generalization. In this paper, we propose a simple and novel building mapping method with Hierarchical Transformers, called HiT, improving polygonal building mapping quality from high-resolution remote sensing images. HiT builds on a two-stage detection architecture by adding a polygon head parallel to classification and bounding box regression heads. HiT simultaneously outputs building bounding boxes and vector polygons, which is fully end-to-end trainable. The polygon head formulates a building polygon as serialized vertices with the bidirectional characteristic, a simple and elegant polygon representation avoiding the start or end vertex hypothesis. Under this new perspective, the polygon head adopts a transformer encoder-decoder architecture to predict serialized vertices supervised by the designed bidirectional polygon loss. Furthermore, a hierarchical attention mechanism combined with convolution operation is introduced in the encoder of the polygon head, providing more geometric structures of building polygons at vertex and edge levels. Comprehensive experiments on two benchmarks (the CrowdAI and Inria datasets) demonstrate that our method achieves a new state-of-the-art in terms of instance segmentation and polygonal metrics compared with state-of-the-art methods. Moreover, qualitative results verify the superiority and effectiveness of our model under complex scenes.
深度学习方法在近年来已经对自动高分辨率遥感图像中的建筑映射进行了深入探索。虽然大多数建筑映射模型为地理和映射系统生成向量建筑边形,但支配的方法通常会在一些问题中分解向量建筑提取,包括分割、多边形化、 Regularization,导致复杂的推理程序、低精度和不良泛化。在本文中,我们提出了一种简单而新颖的建筑映射方法,使用Hierarchical Transformers,称为HiT,以提高高分辨率遥感图像中的多边形建筑映射质量。 HiT基于两阶段的检测架构,在分类和边界框回归头之间添加一个多边形头。 HiT同时输出建筑边界框和向量多边形,完全端到端可训练。多边形的头以双向特征编码顶点,创造了简单而优雅的多边形表示,避免了起始或结束顶点假设。在这个新的视角下,多边形的头采用Transformer编码器和解码器架构,预测由设计的双向多边损失监督的编码顶点。此外,在多边形的头编码器的解码器中,引入了Hierarchical attention机制和卷积操作,提供了更多的建筑多边形几何结构,在顶点和边缘级别提供。对两个基准点( crowdAI和Inria数据集)的全面实验表明,我们的方法和现有方法在实例分割和多边形度量方面相比实现了新的技术水平。此外,定性结果验证我们的模型在复杂场景下的优越性和有效性。
https://arxiv.org/abs/2309.09643
Brain-inspired spiking neural networks (SNNs) have demonstrated great potential for temporal signal processing. However, their performance in speech processing remains limited due to the lack of an effective auditory front-end. To address this limitation, we introduce Spiking-LEAF, a learnable auditory front-end meticulously designed for SNN-based speech processing. Spiking-LEAF combines a learnable filter bank with a novel two-compartment spiking neuron model called IHC-LIF. The IHC-LIF neurons draw inspiration from the structure of inner hair cells (IHC) and they leverage segregated dendritic and somatic compartments to effectively capture multi-scale temporal dynamics of speech signals. Additionally, the IHC-LIF neurons incorporate the lateral feedback mechanism along with spike regularization loss to enhance spike encoding efficiency. On keyword spotting and speaker identification tasks, the proposed Spiking-LEAF outperforms both SOTA spiking auditory front-ends and conventional real-valued acoustic features in terms of classification accuracy, noise robustness, and encoding efficiency.
灵感来自大脑的突触触发神经网络(SNNs)在时间信号处理方面表现出巨大的潜力。然而,在语音处理方面,它们的性能仍然受到限制,因为缺乏有效的听觉前端。为了解决这一限制,我们介绍了Spiking-LEAF,这是一个精心设计的可学习听觉前端,用于基于SNN的语音处理。Spiking-LEAF将可学习滤波器库与一种独特的两阶段突触触发神经元模型IHC-LIF相结合。IHC-LIF神经元从内部 hair细胞(IHC)的结构中汲取灵感,利用分化的脑级和体级 compartment 有效捕捉语音信号的多尺度时间动态。此外,IHC-LIF神经元还包括侧向反馈机制和 spike Regularization Loss 增强 spike 编码效率。在关键词检测和语音识别任务中, proposed Spiking-LEAF 在分类准确性、噪声鲁棒性和编码效率方面优于 SOTA 的突触触发听觉前端和传统的实值声音特征。
https://arxiv.org/abs/2309.09469
Transparent objects are common in daily life. However, depth sensing for transparent objects remains a challenging problem. While learning-based methods can leverage shape priors to improve the sensing quality, the labor-intensive data collection in the real world and the sim-to-real domain gap restrict these methods' scalability. In this paper, we propose a method to finetune a stereo network with sparse depth labels automatically collected using a probing system with tactile feedback. We present a novel utility function to evaluate the benefit of touches. By approximating and optimizing the utility function, we can optimize the probing locations given a fixed touching budget to better improve the network's performance on real objects. We further combine tactile depth supervision with a confidence-based regularization to prevent over-fitting during finetuning. To evaluate the effectiveness of our method, we construct a real-world dataset including both diffuse and transparent objects. Experimental results on this dataset show that our method can significantly improve real-world depth sensing accuracy, especially for transparent objects.
透明物体在日常生活中很常见。然而,对透明物体的深度感知仍然是一个挑战性的问题。虽然基于学习的方法可以利用形状先验来改善感知质量,但在现实生活中进行繁重的数据收集和模拟到现实的domain gap限制这些方法的可扩展性。在本文中,我们提出了一种方法,通过使用具有触觉反馈的探针系统,以稀疏深度标签自动收集,来优化单视图立体网络。我们提出了一种新 utility 函数来评估触摸的效果。通过approximating和优化 utility 函数,我们可以优化给定固定触摸预算的探针位置,更好地改善网络对真实物体的性能。我们还结合触觉深度监督和基于自信的 Regularization 以防止在优化期间过度拟合。为了评估我们的方法的有效性,我们构建了包含扩散和透明的物体的现实世界数据集。该数据集的实验结果显示,我们的方法可以显著改善现实世界的深度感知准确性,特别是对于透明物体。
https://arxiv.org/abs/2309.09427
Recently, data-driven techniques have demonstrated remarkable effectiveness in addressing challenges related to MR imaging inverse problems. However, these methods still exhibit certain limitations in terms of interpretability and robustness. In response, we introduce Convex Latent-Optimized Adversarial Regularizers (CLEAR), a novel and interpretable data-driven paradigm. CLEAR represents a fusion of deep learning (DL) and variational regularization. Specifically, we employ a latent optimization technique to adversarially train an input convex neural network, and its set of minima can fully represent the real data manifold. We utilize it as a convex regularizer to formulate a CLEAR-informed variational regularization model that guides the solution of the imaging inverse problem on the real data manifold. Leveraging its inherent convexity, we have established the convergence of the projected subgradient descent algorithm for the CLEAR-informed regularization model. This convergence guarantees the attainment of a unique solution to the imaging inverse problem, subject to certain assumptions. Furthermore, we have demonstrated the robustness of our CLEAR-informed model, explicitly showcasing its capacity to achieve stable reconstruction even in the presence of measurement interference. Finally, we illustrate the superiority of our approach using MRI reconstruction as an example. Our method consistently outperforms conventional data-driven techniques and traditional regularization approaches, excelling in both reconstruction quality and robustness.
最近,数据驱动方法在解决与核磁共振成像反问题相关的挑战方面表现出了显著的效力。然而,这些方法在解释性和稳定性方面仍存在一些限制。因此,我们引入了Convex Latent-Optimized Adversarial Regularizer(Clear),这是一个新颖且可解释的数据驱动范式。Clear代表了深度学习(DL)和Variational Regularization 的合并。具体而言,我们采用潜在优化技术对输入的凸神经网络进行dversarial训练,其一系列最小值可以完全代表真实的数据集。我们利用它作为凸 Regularizer 来构建Clear- informedVariational Regularization模型,该模型指导真实数据集上的成像反问题的解决方案。利用其内在的凸性,我们建立了Clear- informed Regularization模型的 projected subgradient descent算法的收敛。 this 收敛保证了在一定的假设条件下,达到成像反问题的唯一解决方案。此外,我们展示了我们的Clear- informed模型的稳健性,明确展示了它在存在测量干扰的情况下保持稳定重建的能力。最后,我们使用MRI重建作为例子展示了我们的方法的优势。我们的方法 consistently outperforms traditional data-driven techniques and traditional regularization approaches,在重建质量和稳健性方面都表现出色。
https://arxiv.org/abs/2309.09250
During the last stage of RLHF, a large language model is aligned to human intents via PPO training, a process that generally requires large-scale computational resources. In this technical report, we empirically investigate an efficient implementation of RLHF using low-rank adaptation (LoRA), which allows us to align the LLaMA 7B checkpoint on the Alpaca dataset using only two A100 GPUs instead of the eight required for full model fine-tuning. Despite tuning only 0.2% of LLaMA 7B's parameters, our implementation achieves better performance than the publicly-released AlpacaFarm checkpoint with full model fine-tuning. Next, we analyze several configurations of our LoRA-based PPO implementation, varying the form of the KL regularization term in the training objective. We find that (1) removing this penalty term does not harm performance on the AlpacaFarm evaluation set under our LoRA setup; (2) other regularizers, such as Jensen-Shannon divergence, lead to improved performance; and (3) while PPO training negatively impacts the factuality of model-generated responses, training with LoRA largely mitigates this effect. We release our code and pretrained checkpoints to facilitate future research on more efficient RLHF.
在 RLHF 的最终阶段,大型语言模型通过 PPO 训练与人类意图对齐,这个过程通常需要大规模的计算资源。在这份技术报告中,我们 empirical 地研究了使用低秩适应(LoRA)实现 RLHF 的高效性,LoRA 允许我们在阿尔巴科动物数据集上使用只有两个 A100 GPU 而不是八个用于完全模型微调的 GPU 来对齐 LLaMA 7B checkpoint,尽管只微调了 LLaMA 7B 的参数0.2%。尽管只微调了 LLaMA 7B 的参数0.2%,我们的实现比公开发布的阿尔巴科农场 checkpoint 表现更好。接下来,我们分析了 LoRA based PPO 实现的几种配置,改变训练目标中的KL 正则项的形式。我们发现(1) 去除这个惩罚项不会在我们的 LoRA 设置下对阿尔巴科农场评估集的性能产生负面影响;(2)其他正则项,如 Jensen-Shannon 散度,会导致更好的性能;(3)尽管 PPO 训练对模型生成响应的真实性产生负面影响,但与 LoRA 训练在很大程度上缓解了这种影响。我们发布了我们的代码和预训练 checkpoint,以促进更高效的 RLHF 研究的 future research。
https://arxiv.org/abs/2309.09055