Standing-up control is crucial for humanoid robots, with the potential for integration into current locomotion and loco-manipulation systems, such as fall recovery. Existing approaches are either limited to simulations that overlook hardware constraints or rely on predefined ground-specific motion trajectories, failing to enable standing up across postures in real-world scenes. To bridge this gap, we present HoST (Humanoid Standing-up Control), a reinforcement learning framework that learns standing-up control from scratch, enabling robust sim-to-real transfer across diverse postures. HoST effectively learns posture-adaptive motions by leveraging a multi-critic architecture and curriculum-based training on diverse simulated terrains. To ensure successful real-world deployment, we constrain the motion with smoothness regularization and implicit motion speed bound to alleviate oscillatory and violent motions on physical hardware, respectively. After simulation-based training, the learned control policies are directly deployed on the Unitree G1 humanoid robot. Our experimental results demonstrate that the controllers achieve smooth, stable, and robust standing-up motions across a wide range of laboratory and outdoor environments. Videos are available at this https URL.
站立控制对于人形机器人来说至关重要,它可以被整合到当前的行走和步行-操作系统中,例如跌倒恢复。现有的方法要么局限于忽略硬件约束的模拟环境中,要么依赖于特定地面的动作轨迹,无法在实际场景中的各种姿势下实现站立动作。为了弥合这一差距,我们提出了HoST(人形机器人站立控制),这是一个强化学习框架,能够从头开始学习站立控制,从而实现在不同姿态下的稳健仿真到现实世界的转换。通过利用多批评家架构和基于课程的多样化模拟地形训练,HoST能有效学习适应姿势的动作。为了确保在真实世界中的成功部署,我们用平滑正则化约束动作以减少物理硬件上的振荡运动,并使用隐式动作速度限制来避免剧烈运动。 经过仿真训练后,所学得的控制策略可以直接部署到Unitree G1人形机器人上。我们的实验结果显示,控制器能够实现在广泛的实验室和户外环境中流畅、稳定且稳健地站立动作。视频可在[提供的URL]查看。
https://arxiv.org/abs/2502.08378
Image deblurring remains a central research area within image processing, critical for its role in enhancing image quality and facilitating clearer visual representations across diverse applications. This paper tackles the optimization problem of image deblurring, assuming a known blurring kernel. We introduce an improved optimal proximal gradient algorithm (IOptISTA), which builds upon the optimal gradient method and a weighting matrix, to efficiently address the non-blind image deblurring problem. Based on two regularization cases, namely the $l_1$ norm and total variation norm, we perform numerical experiments to assess the performance of our proposed algorithm. The results indicate that our algorithm yields enhanced PSNR and SSIM values, as well as a reduced tolerance, compared to existing methods.
图像去模糊仍然是图像处理领域中的核心研究方向,对于提升图像质量以及在各种应用中实现更清晰的视觉表现至关重要。本文探讨了已知模糊核情况下的图像去模糊优化问题,并提出了一种改进型最优邻近梯度算法(IOptISTA)。该算法基于最优梯度方法和加权矩阵,能够高效地解决非盲图像去模糊问题。根据$l_1$范数和总变化(total variation)范数两种正则化情况,我们进行了数值实验以评估所提出算法的性能表现。结果显示,与现有方法相比,我们的算法在峰值信噪比(PSNR)和结构相似性指数(SSIM)值上有所提升,并且容忍度更低。
https://arxiv.org/abs/2502.07602
Attributing APT (Advanced Persistent Threat) malware to their respective groups is crucial for threat intelligence and cybersecurity. However, APT adversaries often conceal their identities, rendering attribution inherently adversarial. Existing machine learning-based attribution models, while effective, remain highly vulnerable to adversarial attacks. For example, the state-of-the-art byte-level model MalConv sees its accuracy drop from over 90% to below 2% under PGD (projected gradient descent) attacks. Existing gradient-based adversarial training techniques for malware detection or image processing were applied to malware attribution in this study, revealing that both robustness and training efficiency require significant improvement. To address this, we propose RoMA, a novel single-step adversarial training approach that integrates global perturbations to generate enhanced adversarial samples and employs adversarial consistency regularization to improve representation quality and resilience. A novel APT malware dataset named AMG18, with diverse samples and realistic class imbalances, is introduced for evaluation. Extensive experiments show that RoMA significantly outperforms seven competing methods in both adversarial robustness (e.g., achieving over 80% robust accuracy-more than twice that of the next-best method under PGD attacks) and training efficiency (e.g., more than twice as fast as the second-best method in terms of accuracy), while maintaining superior standard accuracy in non-adversarial scenarios.
将高级持续威胁(APT)恶意软件归因于相应的攻击组织对于威胁情报和网络安全至关重要。然而,APT对手通常会隐藏自己的身份,使得这种归因本质上具有对抗性。现有的基于机器学习的归因模型虽然有效,但仍极易受到对抗性攻击的影响。例如,最先进的字节级模型MalConv在面对PGD(投影梯度下降)攻击时,其准确性从超过90%骤降至2%以下。现有用于恶意软件检测或图像处理的基于梯度的对抗训练技术在这项研究中被应用于恶意软件归因,并揭示出这些方法的鲁棒性和训练效率亟需大幅改进。 为解决这些问题,我们提出了一种新的单步对抗训练方法RoMA(Robust Malware Attribution),该方法通过整合全局扰动生成增强的对抗样本并采用对抗一致性正则化以提高表示质量和抵御能力。此外,还引入了一个名为AMG18的新APT恶意软件数据集,其中包含多样化的真实类不平衡样本来进行评估。 广泛的实验结果表明,在对抗鲁棒性(例如,在PGD攻击下实现超过80%的稳健准确性,比排名第二的方法高出两倍以上)和训练效率方面(例如,在准确率上,RoMA比次优方法快两倍多),RoMA显著优于七种竞争方法。同时在非对抗场景中保持优越的标准准确性。 这种方法通过结合全局扰动和对抗一致性正则化技术来增强模型的鲁棒性和表示质量,从而解决了现有归因系统面对对抗性攻击时的脆弱性问题,并且提升了训练效率,在实际应用中的性能表现尤为突出。
https://arxiv.org/abs/2502.07492
In recent years, deep learning with Convolutional Neural Networks (CNNs) has achieved remarkable results in the field of HMER (Handwritten Mathematical Expression Recognition). However, it remains challenging to improve performance with limited labeled training data. This paper presents, for the first time, a simple yet effective semi-supervised HMER framework by introducing dual-branch semi-supervised learning. Specifically, we simplify the conventional deep co-training from consistency regularization to cross-supervised learning, where the prediction of one branch is used as a pseudo-label to supervise the other branch directly end-to-end. Considering that the learning of the two branches tends to converge in the later stages of model optimization, we also incorporate a weak-to-strong strategy by applying different levels of augmentation to each branch, which behaves like expanding the training data and improving the quality of network training. Meanwhile, We propose a novel module, Global Dynamic Counting Module(GDCM), to enhance the performance of the HMER decoder, which alleviates recognition inaccuracies in long-distance formula recognition and the occurrence of repeated characters. We release our code at this https URL.
近年来,使用卷积神经网络(CNN)的深度学习在手写数学表达式识别(HMER)领域取得了显著成果。然而,在训练数据有限的情况下提高性能仍然具有挑战性。本文首次提出了一种简单而有效的半监督HMER框架,通过引入双分支半监督学习来实现这一目标。具体来说,我们将传统的深度协同训练简化为一致性正则化到交叉监督学习,其中一个分支的预测被用作另一个分支的伪标签,并进行端到端的监督。考虑到两个分支的学习在模型优化后期趋于收敛,我们还提出了一种弱至强策略,通过将不同级别的增强应用于每个分支来实现,这类似于扩展训练数据并提高网络训练质量。 同时,我们提出了一种新的模块——全局动态计数模块(GDCM),以提升HMER解码器的性能。该模块能够缓解长距离公式识别中的识别不准确问题以及重复字符的发生。 我们的代码可以在以下链接中获取:[此URL](https://this https URL) (请将"this https URL"替换为实际发布的代码链接地址)
https://arxiv.org/abs/2502.07172
We present a method for recovering the shape and radiance of a scene consisting of multiple people given solely a few images. Multi-human scenes are complex due to additional occlusion and clutter. For single-human settings, existing approaches using implicit neural representations have achieved impressive results that deliver accurate geometry and appearance. However, it remains challenging to extend these methods for estimating multiple humans from sparse views. We propose a neural implicit reconstruction method that addresses the inherent challenges of this task through the following contributions: First, we propose to use geometry constraints by exploiting pre-computed meshes using a human body model (SMPL). Specifically, we regularize the signed distances using the SMPL mesh and leverage bounding boxes for improved rendering. Second, we propose a ray regularization scheme to minimize rendering inconsistencies, and a saturation regularization for robust optimization in variable illumination. Extensive experiments on both real and synthetic datasets demonstrate the benefits of our approach and show state-of-the-art performance against existing neural reconstruction methods.
我们提出了一种仅基于少量图像恢复包含多个人物场景的形状和辐射强度的方法。由多人组成的场景由于额外的遮挡和杂乱而变得复杂。对于单人设置,现有的使用隐式神经表示的方法已经取得了令人印象深刻的结果,能够提供准确的几何结构和外观。然而,从稀疏视角估计多个个体仍然是一个挑战性的任务。为此,我们提出了一种神经隐式重建方法,通过以下贡献来解决这一任务的固有难题: 首先,我们建议利用预先计算的人体模型(SMPL)网格,采用几何约束条件。具体来说,我们将使用SMPL网格对带符号的距离进行正则化,并且利用边界框以改进渲染效果。 其次,我们提出了一种射线正则化方案来最小化渲染不一致性,并针对变化照明环境下提出了饱和度正则化方法,从而实现稳健的优化过程。 在真实和合成数据集上的广泛实验表明了我们的方法的优势,并显示出了与现有神经重建方法相比的最佳性能。
https://arxiv.org/abs/2502.07140
Quantizing model weights is critical for reducing the communication and inference costs of large models. However, quantizing models -- especially to low precisions like int4 or int2 -- requires a trade-off in model quality; int2, in particular, is known to severely degrade model quality. Consequently, practitioners are often forced to maintain multiple models with different quantization levels or serve a single model that best satisfies the quality-latency trade-off. On the other hand, integer data types, such as int8, inherently possess a nested (Matryoshka) structure where smaller bit-width integers, like int4 or int2, are nested within the most significant bits. This paper proposes Matryoshka Quantization (MatQuant), a novel multi-scale quantization technique that addresses the challenge of needing multiple quantized models. It allows training and maintaining just one model, which can then be served at different precision levels. Furthermore, due to the co-training and co-distillation regularization provided by MatQuant, the int2 precision models extracted by MatQuant can be up to $10\%$ more accurate than standard int2 quantization (using techniques like QAT or OmniQuant). This represents significant progress in model quantization, demonstrated by the fact that, with the same recipe, an int2 FFN-quantized Gemma-2 9B model is more accurate than an int8 FFN-quantized Gemma-2 2B model.
模型权重的量化对于减少大型模型的通信和推理成本至关重要。然而,特别是当量化到低精度(如int4或int2)时,这需要在模型质量上进行权衡;特别是int2,它已知会显著降低模型的质量。因此,实践者往往被迫维护具有不同量化水平的多个模型,或者提供一个满足质量和延迟之间最佳折衷的单一模型。另一方面,像int8这样的整数数据类型内在地具备嵌套(俄罗斯娃娃)结构,在这种结构中,较小位宽的整数(如int4或int2)被包含在最高有效位内。 本文提出了一种新的多尺度量化技术——Matryoshka 量化(简称 MatQuant),以应对需要维护多个量化模型的挑战。该技术允许训练和维护单一模型,并且可以在不同的精度水平上进行服务。此外,由于MatQuant提供的共训和共蒸馏正则化机制,通过MatQuant提取的int2精度模型可以比标准的int2量化(如使用QAT或OmniQuant)方法准确率高最多10%。这标志着在模型量化方面取得了重大进展,因为相同的配方下,一个int2 FFN-量化的Gemma-2 9B模型比一个int8 FFN-量化的Gemma-2 2B模型更精确。 通过这种创新的方法,MatQuant不仅解决了多模型维护的问题,还提升了低精度模型的质量水平。
https://arxiv.org/abs/2502.06786
Dynamic graph clustering aims to detect and track time-varying clusters in dynamic graphs, revealing the evolutionary mechanisms of complex real-world dynamic systems. Matrix factorization-based methods are promising approaches for this task; however, these methods often struggle with scalability and can be time-consuming when applied to large-scale dynamic graphs. Moreover, they tend to lack robustness and are vulnerable to real-world noisy data. To address these issues, we make three key contributions. First, to improve scalability, we propose temporal separated matrix factorization, where a single matrix is divided into multiple smaller matrices for independent factorization, resulting in faster computation. Second, to improve robustness, we introduce bi-clustering regularization, which jointly optimizes graph embedding and clustering, thereby filtering out noisy features from the graph embeddings. Third, to further enhance effectiveness and efficiency, we propose selective embedding updating, where we update only the embeddings of dynamic nodes while the embeddings of static nodes are fixed among different timestamps. Experimental results on six synthetic and five real-world benchmarks demonstrate the scalability, robustness and effectiveness of our proposed method. Source code is available at this https URL.
动态图聚类旨在检测和追踪时间变化的集群在动态图中,揭示复杂现实世界动态系统的发展机制。基于矩阵分解的方法是这一任务有前景的解决方案;然而,这些方法在应用于大规模动态图时往往面临可扩展性和计算耗时的问题,并且通常缺乏鲁棒性,在处理真实世界的噪声数据时表现脆弱。为了解决这些问题,我们做出了三项关键贡献。 首先,为了提高可扩展性,我们提出了时间分离矩阵分解方法,其中将一个大矩阵分割成多个较小的矩阵进行独立分解,从而加快计算速度。其次,为了增强鲁棒性,我们引入了双聚类正则化技术,该技术同时优化图嵌入和聚类过程,从图嵌入中过滤掉噪声特征。最后,为进一步提高效果和效率,我们提出了选择性嵌入更新方法,在此过程中只更新动态节点的嵌入,而静态节点的嵌入在整个时间戳内保持不变。 在六个合成基准数据集和五个真实世界基准数据集上的实验结果证明了所提出方法的可扩展性、鲁棒性和有效性。源代码可在[提供的链接]获取。
https://arxiv.org/abs/2502.06117
Recent advancements in reinforcement learning (RL) have achieved great success in fine-tuning diffusion-based generative models. However, fine-tuning continuous flow-based generative models to align with arbitrary user-defined reward functions remains challenging, particularly due to issues such as policy collapse from overoptimization and the prohibitively high computational cost of likelihoods in continuous-time flows. In this paper, we propose an easy-to-use and theoretically sound RL fine-tuning method, which we term Online Reward-Weighted Conditional Flow Matching with Wasserstein-2 Regularization (ORW-CFM-W2). Our method integrates RL into the flow matching framework to fine-tune generative models with arbitrary reward functions, without relying on gradients of rewards or filtered datasets. By introducing an online reward-weighting mechanism, our approach guides the model to prioritize high-reward regions in the data manifold. To prevent policy collapse and maintain diversity, we incorporate Wasserstein-2 (W2) distance regularization into our method and derive a tractable upper bound for it in flow matching, effectively balancing exploration and exploitation of policy optimization. We provide theoretical analyses to demonstrate the convergence properties and induced data distributions of our method, establishing connections with traditional RL algorithms featuring Kullback-Leibler (KL) regularization and offering a more comprehensive understanding of the underlying mechanisms and learning behavior of our approach. Extensive experiments on tasks including target image generation, image compression, and text-image alignment demonstrate the effectiveness of our method, where our method achieves optimal policy convergence while allowing controllable trade-offs between reward maximization and diversity preservation.
近期在强化学习(RL)领域的进展已经在微调基于扩散的生成模型方面取得了巨大成功。然而,连续流生成模型与任意用户定义奖励函数对齐的微调仍然是一个挑战,这主要是由于过度优化导致策略崩溃和计算连续时间流动的概率成本过高的问题。在这篇论文中,我们提出了一种易于使用且理论上有根据的RL微调方法,称之为基于Wasserstein-2正则化的在线奖励加权条件流匹配(ORW-CFM-W2)。我们的方法将强化学习与流匹配框架相结合,以利用任意奖励函数来微调生成模型,并不依赖于奖励梯度或过滤数据集。通过引入一个在线奖励加权机制,我们的方法引导模型优先考虑数据流形中的高回报区域。 为了防止策略崩溃并保持多样性,我们将在方法中结合Wasserstein-2(W2)距离正则化,并在流匹配中导出其可计算的上界,有效平衡了策略优化中的探索和利用。我们提供了理论分析以展示我们的方法的收敛性质及其诱导的数据分布,并建立了与传统带有Kullback-Leibler (KL) 正则化的RL算法之间的联系,为我们方法背后的机制和学习行为提供了一个更全面的理解。 在包括目标图像生成、图像压缩以及文本-图像对齐等任务中的广泛实验显示了我们方法的有效性,在这些任务中我们的方法实现了最优策略收敛,同时允许回报最大化与多样性保持之间进行可控的权衡。
https://arxiv.org/abs/2502.06061
KL-regularized policy optimization has become a workhorse in learning-based decision making, while its theoretical understanding is still very limited. Although recent progress has been made towards settling the sample complexity of KL-regularized contextual bandits, existing sample complexity bounds are either $\tilde{O}(\epsilon^{-2})$ under single-policy concentrability or $\tilde{O}(\epsilon^{-1})$ under all-policy concentrability. In this paper, we propose the \emph{first} algorithm with $\tilde{O}(\epsilon^{-1})$ sample complexity under single-policy concentrability for offline contextual bandits. Our algorithm is designed for general function approximation and based on the principle of \emph{pessimism in the face of uncertainty}. The core of our proof leverages the strong convexity of the KL regularization, and the conditional non-negativity of the gap between the true reward and its pessimistic estimator to refine a mean-value-type risk upper bound to its extreme. This in turn leads to a novel covariance-based analysis, effectively bypassing the need for uniform control over the discrepancy between any two functions in the function class. The near-optimality of our algorithm is demonstrated by an $\tilde{\Omega}(\epsilon^{-1})$ lower bound. Furthermore, we extend our algorithm to contextual dueling bandits and achieve a similar nearly optimal sample complexity.
KL正则化的策略优化在基于学习的决策制定中已成为一种核心方法,尽管其理论理解仍然非常有限。虽然最近关于KL正则化上下文多臂赌博机(contextual bandit)样本复杂度的研究取得了一些进展,现有的样本复杂度界要么为$\tilde{O}(\epsilon^{-2})$,适用于单策略集中的情况;要么为$\tilde{O}(\epsilon^{-1)}$,适用于所有策略都集中的情况。在本文中,我们提出了首个在线上下文多臂赌博机算法,在单一策略集中的情况下具有$\tilde{O}(\epsilon^{-1})$的样本复杂度。 我们的算法设计用于一般的函数近似,并基于“不确定性面前谨慎为上”的原则。证明的核心利用了KL正则化的强凸性,以及真实奖励与其悲观估计值之间差距在条件下的非负性,将一种均值类型的风险上限精炼到了极致。这进而导致了一种新颖的基于协方差分析的方法,有效地避免了对函数类中任意两个函数间差异的一致控制需求。 我们算法的近优性能通过$\tilde{\Omega}(\epsilon^{-1})$下界得到了验证,并且还将我们的算法扩展到上下文对抗多臂赌博机(contextual dueling bandits),并实现了类似的接近最优样本复杂度。
https://arxiv.org/abs/2502.06051
Concept-based learning enhances prediction accuracy and interpretability by leveraging high-level, human-understandable concepts. However, existing CBL frameworks do not address survival analysis tasks, which involve predicting event times in the presence of censored data -- a common scenario in fields like medicine and reliability analysis. To bridge this gap, we propose two novel models: SurvCBM (Survival Concept-based Bottleneck Model) and SurvRCM (Survival Regularized Concept-based Model), which integrate concept-based learning with survival analysis to handle censored event time data. The models employ the Cox proportional hazards model and the Beran estimator. SurvCBM is based on the architecture of the well-known concept bottleneck model, offering interpretable predictions through concept-based explanations. SurvRCM uses concepts as regularization to enhance accuracy. Both models are trained end-to-end and provide interpretable predictions in terms of concepts. Two interpretability approaches are proposed: one leveraging the linear relationship in the Cox model and another using an instance-based explanation framework with the Beran estimator. Numerical experiments demonstrate that SurvCBM outperforms SurvRCM and traditional survival models, underscoring the importance and advantages of incorporating concept information. The code for the proposed algorithms is publicly available.
基于概念的学习通过利用高层次的人类可理解的概念,增强了预测的准确性和解释性。然而,现有的基于概念学习框架没有解决生存分析任务的问题,这些任务涉及在存在删失数据的情况下预测事件时间——这种情况在医学和可靠性分析等领域的应用中非常普遍。为了弥补这一空白,我们提出了两个新颖的模型:SurvCBM(生存概念瓶颈模型)和SurvRCM(带有正则化的生存概念模型),这两个模型将基于概念的学习与生存分析相结合,以处理删失事件时间数据。这些模型采用了Cox比例风险模型和Beran估计器。 SurvCBM 基于著名的概念瓶颈模型的架构,在通过基于概念的解释提供可解释性预测方面具有优势。而SurvRCM 则利用概念作为正则化手段来提高准确性。两个模型都采用端到端训练,并且在概念层面提供了可解释的预测结果。 为了提高模型的透明度,我们提出了两种方法:一种是借助Cox模型中的线性关系进行解析,另一种则是使用Beran估计器的实例基础解释框架。通过数值实验表明,SurvCBM 模型超越了 SurvRCM 和传统生存分析模型的表现,这突显出纳入概念信息的重要性及其优势。 我们公开了所提出算法的代码以便研究和应用。
https://arxiv.org/abs/2502.05950
In recent years, as a compromise between privacy and performance, few-sample model compression has been widely adopted to deal with limited data resulting from privacy and security concerns. However, when the number of available samples is extremely limited, class imbalance becomes a common and tricky problem. Achieving an equal number of samples across all classes is often costly and impractical in real-world applications, and previous studies on few-sample model compression have mostly ignored this significant issue. Our experiments comprehensively demonstrate that class imbalance negatively affects the overall performance of few-sample model compression methods. To address this problem, we propose a novel and adaptive framework named OOD-Enhanced Few-Sample Model Compression (OE-FSMC). This framework integrates easily accessible out-of-distribution (OOD) data into both the compression and fine-tuning processes, effectively rebalancing the training distribution. We also incorporate a joint distillation loss and a regularization term to reduce the risk of the model overfitting to the OOD data. Extensive experiments on multiple benchmark datasets show that our framework can be seamlessly incorporated into existing few-sample model compression methods, effectively mitigating the accuracy degradation caused by class imbalance.
近年来,为了在隐私和性能之间取得平衡,少量样本模型压缩技术被广泛采用,以应对由于隐私和安全问题导致的数据限制。然而,在可用样本数量极为有限的情况下,类别不平衡成为了一个常见的且棘手的问题。实现所有类别的样本数量均衡往往在实际应用中既昂贵又不切实际,并且先前关于少量样本模型压缩的研究大多忽视了这一重要问题。我们的实验全面地展示了类别不平衡对少量样本模型压缩方法的整体性能产生了负面影响。 为了解决这个问题,我们提出了一种新颖且自适应的框架,名为“OOD增强型少量样本模型压缩”(OE-FSMC)。该框架将容易获取的分布外(OOD)数据集成到压缩和微调的过程中,有效地重新平衡训练分布。此外,我们还引入了一个联合蒸馏损失函数和一个正则化项,以降低模型过度拟合于OOD数据的风险。 在多个基准数据集上的广泛实验表明,我们的框架可以无缝地融入现有的少量样本模型压缩方法中,并有效缓解了由于类别不平衡导致的准确性下降问题。
https://arxiv.org/abs/2502.05832
Hyperedge prediction is a fundamental task to predict future high-order relations based on the observed network structure. Existing hyperedge prediction methods, however, suffer from the data sparsity problem. To alleviate this problem, negative sampling methods can be used, which leverage non-existing hyperedges as contrastive information for model training. However, the following important challenges have been rarely studied: (C1) lack of guidance for generating negatives and (C2) possibility of producing false negatives. To address them, we propose a novel hyperedge prediction method, HyGEN, that employs (1) a negative hyperedge generator that employs positive hyperedges as a guidance to generate more realistic ones and (2) a regularization term that prevents the generated hyperedges from being false negatives. Extensive experiments on six real-world hypergraphs reveal that HyGEN consistently outperforms four state-of-the-art hyperedge prediction methods.
超边预测是一项基本任务,旨在根据观察到的网络结构来预测未来的高阶关系。然而,现有的超边预测方法受到数据稀疏性问题的影响。为了解决这一问题,可以使用负采样方法,利用不存在的超边作为模型训练中的对比信息。但是,以下两个重要的挑战却鲜有研究:(C1) 缺乏生成负样本的指导;以及 (C2) 产生假阴性的可能性。为了应对这些挑战,我们提出了一种新的超边预测方法HyGEN,该方法采用了两种策略:(1) 使用正向超边作为指导来生成更真实的负向超边的生成器;以及 (2) 防止生成的超边成为假阴性的正则化项。在六个真实世界的超图上进行的大量实验表明,HyGEN始终优于四种最先进的超边预测方法。
https://arxiv.org/abs/2502.05827
How can we effectively remove or "unlearn" undesirable information, such as specific features or individual data points, from a learning outcome while minimizing utility loss and ensuring rigorous guarantees? We introduce a mathematical framework based on information-theoretic regularization to address both feature and data point unlearning. For feature unlearning, we derive a unified solution that simultaneously optimizes diverse learning objectives, including entropy, conditional entropy, KL-divergence, and the energy of conditional probability. For data point unlearning, we first propose a novel definition that serves as a practical condition for unlearning via retraining, is easy to verify, and aligns with the principles of differential privacy from an inference perspective. Then, we provide provable guarantees for our framework on data point unlearning. By combining flexibility in learning objectives with simplicity in regularization design, our approach is highly adaptable and practical for a wide range of machine learning and AI applications.
如何有效地移除或“忘记”不希望的信息,例如特定特征或个体数据点,同时将效用损失降至最低并确保严格的保障?我们提出了一种基于信息论正则化的数学框架来解决特征和数据点的遗忘问题。对于特征遗忘,我们推导出一个统一的解决方案,该方案可以同时优化包括熵、条件熵、KL散度以及条件概率能量在内的多种学习目标。而对于数据点遗忘,首先我们提出了一个新的定义作为通过重新训练实现遗忘的实际条件,这个定义易于验证并且从推理角度来看符合差分隐私的原则。然后,我们为我们的框架在数据点遗忘方面提供了可证明的保障。 通过结合学习目标的灵活性与正则化设计的简洁性,这种方法具有高度适应性和实用性,适用于广泛的机器学习和人工智能应用。
https://arxiv.org/abs/2502.05684
This study demonstrates a novel use of the U-Net architecture in the field of semantic segmentation to detect landforms using preprocessed satellite imagery. The study applies the U-Net model for effective feature extraction by using Convolutional Neural Network (CNN) segmentation techniques. Dropout is strategically used for regularization to improve the model's perseverance, and the Adam optimizer is used for effective training. The study thoroughly assesses the performance of the U-Net architecture utilizing a large sample of preprocessed satellite topographical images. The model excels in semantic segmentation tasks, displaying high-resolution outputs, quick feature extraction, and flexibility to a wide range of applications. The findings highlight the U-Net architecture's substantial contribution to the advancement of machine learning and image processing technologies. The U-Net approach, which emphasizes pixel-wise categorization and comprehensive segmentation map production, is helpful in practical applications such as autonomous driving, disaster management, and land use planning. This study not only investigates the complexities of U-Net architecture for semantic segmentation, but also highlights its real-world applications in image classification, analysis, and landform identification. The study demonstrates the U-Net model's key significance in influencing the environment of modern technology.
这项研究展示了U-Net架构在语义分割领域的一种新颖应用,即利用预处理的卫星图像来检测地形。该研究通过采用卷积神经网络(CNN)分割技术,将U-Net模型应用于有效特征提取。为了提高模型的泛化能力,策略性地使用了Dropout正则化方法,并且采用了Adam优化器来进行有效的训练。这项研究详细评估了利用大量预处理后的卫星地形图像样本,对U-Net架构性能的表现。 该模型在语义分割任务中表现出色,能够生成高分辨率输出、快速特征提取以及适应各种应用的灵活性。研究结果强调了U-Net架构对机器学习和图像处理技术进步的重要贡献。这种以像素级分类和全面分割图谱生产为重点的方法,在自主驾驶、灾害管理及土地利用规划等实际应用中非常有用。 该研究不仅探讨了语义分割领域中U-Net架构的复杂性,还强调了其在图像分类、分析以及地形识别等方面的现实世界应用。这项研究表明,U-Net模型在其所影响的现代技术环境中具有关键的重要性。
https://arxiv.org/abs/2502.05476
Magnetic data inversion is an important tool in geophysics, used to infer subsurface magnetic susceptibility distributions from surface magnetic field measurements. This inverse problem is inherently ill-posed, characterized by non-unique solutions, depth ambiguity, and sensitivity to noise. Traditional inversion approaches rely on predefined regularization techniques to stabilize solutions, limiting their adaptability to complex or diverse geological scenarios. In this study, we propose an approach that integrates variable dictionary learning and scale-space methods to address these challenges. Our method employs learned dictionaries, allowing for adaptive representation of complex subsurface features that are difficult to capture with predefined bases. Additionally, we extend classical variational inversion by incorporating multi-scale representations through a scale-space framework, enabling the progressive introduction of structural detail while mitigating overfitting. We implement both fixed and dynamic dictionary learning techniques, with the latter introducing iteration-dependent dictionaries for enhanced flexibility. Using a synthetic dataset to simulate geological scenarios, we demonstrate significant improvements in reconstruction accuracy and robustness compared to conventional variational and dictionary-based methods. Our results highlight the potential of learned dictionaries, especially when coupled with scale-space dynamics, to improve model recovery and noise handling. These findings underscore the promise of our data-driven approach for advance magnetic data inversion and its applications in geophysical exploration, environmental assessment, and mineral prospecting.
磁性数据反演是地球物理学中的一个重要工具,用于从地表磁场测量中推断地下磁化率分布。这一逆问题本质上是不适定的,表现为非唯一解、深度模糊性和对噪声敏感等特点。传统反演方法依赖于预定义的正则化技术来稳定解法,这限制了它们在复杂或多样地质场景中的适应性。在这项研究中,我们提出了一种结合可变字典学习和尺度空间方法的方法,以应对这些挑战。我们的方法采用通过学习得到的字典,从而能够自适应地表示预定义基难以捕捉到的复杂的地下特征。此外,我们通过引入多尺度表示扩展了经典的变分反演法,在一个尺度空间框架下逐步加入结构细节的同时减少过拟合的风险。 我们实现了固定和动态字典学习技术,并且后者采用了迭代依赖字典以增强灵活性。使用合成数据集模拟地质场景时,我们的方法在重建精度和鲁棒性方面相较于传统的变分和基于字典的方法取得了显著改进。研究结果突显了通过结合尺度空间动力学的可学习字典在模型恢复和噪声处理方面的潜力。 这些发现强调了我们这种以数据驱动的方法在先进磁性数据分析中的前景,以及它在地球物理勘探、环境评估和矿产勘查应用中的潜在价值。
https://arxiv.org/abs/2502.05451
The primary focus of offline reinforcement learning (RL) is to manage the risk of hazardous exploitation of out-of-distribution actions. An effective approach to achieve this goal is through behavior regularization, which augments conventional RL objectives by incorporating constraints that enforce the policy to remain close to the behavior policy. Nevertheless, existing literature on behavior-regularized RL primarily focuses on explicit policy parameterizations, such as Gaussian policies. Consequently, it remains unclear how to extend this framework to more advanced policy parameterizations, such as diffusion models. In this paper, we introduce BDPO, a principled behavior-regularized RL framework tailored for diffusion-based policies, thereby combining the expressive power of diffusion policies and the robustness provided by regularization. The key ingredient of our method is to calculate the Kullback-Leibler (KL) regularization analytically as the accumulated discrepancies in reverse-time transition kernels along the diffusion trajectory. By integrating the regularization, we develop an efficient two-time-scale actor-critic RL algorithm that produces the optimal policy while respecting the behavior constraint. Comprehensive evaluations conducted on synthetic 2D tasks and continuous control tasks from the D4RL benchmark validate its effectiveness and superior performance.
离线强化学习(RL)的主要关注点是管理出界动作的危险利用风险。实现这一目标的有效方法是通过行为正则化,即在传统的RL目标中加入约束条件,使策略保持接近于行为策略的状态。然而,现有的关于行为正则化的RL文献主要集中在显式策略参数化上,如高斯策略。因此,如何将此框架扩展到更高级别的策略参数化方法(例如扩散模型)仍不清楚。在本文中,我们介绍了一种新的方法BDPO,这是一种专为基于扩散的策略设计的行为正则化RL框架,它结合了扩散策略的强大表达能力和正则化提供的鲁棒性。 该方法的核心是通过逆时间转移核在扩散轨迹上的累积差异来解析计算Kullback-Leibler (KL) 正则化的值。通过整合这种正则化机制,我们开发了一种高效的双速率(two-time-scale)演员-评论家RL算法,可以同时生成最优策略并遵守行为约束条件。 我们在合成2D任务和来自D4RL基准的连续控制任务上进行了全面评估,验证了该方法的有效性和优越性能。
https://arxiv.org/abs/2502.04778
This paper introduces NN-STNE, a neural network using t-distributed stochastic neighbor embedding (t-SNE) as a hidden layer to reduce input dimensions by mapping long time-series data into shapelet membership probabilities. A Gaussian kernel-based mean square error preserves local data structure, while K-means initializes shapelet candidates due to the non-convex optimization challenge. Unlike existing methods, our approach uses t-SNE to address crowding in low-dimensional space and applies L1-norm regularization to optimize shapelet length. Evaluations on the UCR dataset and an electrical component manipulation task, like switching on, demonstrate improved clustering accuracy over state-of-the-art feature-learning methods in robotics.
本文介绍了NN-STNE,这是一种神经网络,它使用t-分布随机邻域嵌入(t-SNE)作为隐藏层来通过将长时间序列数据映射到形状特征成员概率上来减少输入维度。基于高斯核的均方误差保留了局部数据结构,而K-means由于非凸优化挑战被用来初始化形状候选特征。与现有方法不同,我们的方法使用t-SNE解决低维空间中的拥挤问题,并应用L1范数正则化来优化形状长度。在UCR数据集和电气元件操作任务(如开关)上的评估显示,在机器人领域中,NN-STNE的聚类准确性优于最先进的特征学习方法。
https://arxiv.org/abs/2502.04167
YOLOv4 achieved the best performance on the COCO dataset by combining advanced techniques for regression (bounding box positioning) and classification (object class identification) using the Darknet framework. To enhance accuracy and adaptability, it employs Cross mini-Batch Normalization, Cross-Stage-Partial-connections, Self-Adversarial-Training, and Weighted-Residual-Connections, as well as CIoU loss, Mosaic data augmentation, and DropBlock regularization. With Mosaic augmentation and multi-resolution training, YOLOv4 achieves superior detection in diverse scenarios, attaining 43.5\% AP (in contrast, 65.7\% AP50) on a Tesla V100 at ~65 frames per second, ensuring efficiency, affordability, and adaptability for real-world environments.
YOLOv4 在 COCO 数据集上实现了最佳性能,通过结合 Darknet 框架中的高级回归(边界框定位)和分类(对象类别识别)技术。为了提高准确性和适应性,它采用了跨小批量归一化、跨阶段部分连接、自对抗训练以及加权残差连接,并使用 CIoU 损失函数、Mosaic 数据增强和 DropBlock 正则化方法。通过 Mosaic 数据增强和多分辨率训练,YOLOv4 在各种场景中实现了卓越的检测效果,在 Tesla V100 上达到了约 65 帧每秒的速度,并且平均精度(AP)为 43.5%,而在 AP50(仅考虑最准确的预测部分)方面则达到 65.7%。这确保了其在实际环境中的高效性、经济性和适应性。
https://arxiv.org/abs/2502.04161
State-of-the-art image reconstruction often relies on complex, highly parameterized deep architectures. We propose an alternative: a data-driven reconstruction method inspired by the classic Tikhonov regularization. Our approach iteratively refines intermediate reconstructions by solving a sequence of quadratic problems. These updates have two key components: (i) learned filters to extract salient image features, and (ii) an attention mechanism that locally adjusts the penalty of filter responses. Our method achieves performance on par with leading plug-and-play and learned regularizer approaches while offering interpretability, robustness, and convergent behavior. In effect, we bridge traditional regularization and deep learning with a principled reconstruction approach.
最先进的图像重建技术通常依赖于复杂的、高度参数化的深度架构。我们提出了一种替代方案:一种受经典Tikhonov正则化启发的数据驱动重建方法。我们的方法通过解决一系列二次问题来迭代细化中间的重建结果。这些更新包含两个关键部分:(i) 学习到的滤波器,用于提取图像的重要特征;(ii) 一个注意力机制,在局部调整滤波响应的惩罚权重。我们的方法在性能上与领先的一键式(plug-and-play)和学习型正则化方法相当,同时提供了可解释性、鲁棒性和收敛行为。实际上,我们通过一种原则性的重建方法弥合了传统正则化和深度学习之间的差距。
https://arxiv.org/abs/2502.04079
Remote Photoplethysmography (rPPG) is a promising technique to monitor physiological signals such as heart rate from facial videos. However, the labeled facial videos in this research are challenging to collect. Current rPPG research is mainly based on several small public datasets collected in simple environments, which limits the generalization and scale of the AI models. Semi-supervised methods that leverage a small amount of labeled data and abundant unlabeled data can fill this gap for rPPG learning. In this study, a novel semi-supervised learning method named Semi-rPPG that combines curriculum pseudo-labeling and consistency regularization is proposed to extract intrinsic physiological features from unlabelled data without impairing the model from noises. Specifically, a curriculum pseudo-labeling strategy with signal-to-noise ratio (SNR) criteria is proposed to annotate the unlabelled data while adaptively filtering out the low-quality unlabelled data. Besides, a novel consistency regularization term for quasi-periodic signals is proposed through weak and strong augmented clips. To benefit the research on semi-supervised rPPG measurement, we establish a novel semi-supervised benchmark for rPPG learning through intra-dataset and cross-dataset evaluation on four public datasets. The proposed Semi-rPPG method achieves the best results compared with three classical semi-supervised methods under different protocols. Ablation studies are conducted to prove the effectiveness of the proposed methods.
远程光电容积描记法(rPPG)是一种有前景的技术,可用于从面部视频中监测生理信号,如心率。然而,在这项研究中收集标注的面部视频是一项挑战。目前关于rPPG的研究主要基于几个在简单环境下收集的小型公开数据集,这限制了AI模型的泛化能力和规模。半监督方法可以利用少量标记的数据和大量未标记的数据来填补这一空白。 本文提出了一种名为Semi-rPPG的新颖半监督学习方法,该方法结合课程伪标签策略和一致性正则化技术,从无标签数据中提取内在生理特征,并避免噪声对模型的影响。具体而言,我们提出了一个基于信噪比(SNR)标准的课程伪标记策略来标注未标注的数据,同时适应性地过滤掉低质量的未标注数据。此外,通过弱和强增强视频片段,提出了一种针对准周期信号的新一致性正则化项。 为了促进半监督rPPG测量的研究,我们建立了一个新颖的半监督基准测试平台,通过对四个公开数据集进行跨内集和跨集合评估来实现这一目标。所提出的Semi-rPPG方法在不同的协议下优于三种经典的半监督方法。通过消融研究证明了所提出方法的有效性。 总的来说,这项工作提供了一种有效的框架,用于开发更强大的rPPG模型,并且可以应用于广泛的面部视频数据集,以提高生理信号监测的准确性和鲁棒性。
https://arxiv.org/abs/2502.03855