Vision Transformers (ViTs) have shown success across a variety of tasks due to their ability to capture global image representations. Recent studies have identified the existence of high-norm tokens in ViTs, which can interfere with unsupervised object discovery. To address this, the use of "registers" which are additional tokens that isolate high norm patch tokens while capturing global image-level information has been proposed. While registers have been studied extensively for object discovery, their generalization properties particularly in out-of-distribution (OOD) scenarios, remains underexplored. In this paper, we examine the utility of register token embeddings in providing additional features for improving generalization and anomaly rejection. To that end, we propose a simple method that combines the special CLS token embedding commonly employed in ViTs with the average-pooled register embeddings to create feature representations which are subsequently used for training a downstream classifier. We find that this enhances OOD generalization and anomaly rejection, while maintaining in-distribution (ID) performance. Extensive experiments across multiple ViT backbones trained with and without registers reveal consistent improvements of 2-4\% in top-1 OOD accuracy and a 2-3\% reduction in false positive rates for anomaly detection. Importantly, these gains are achieved without additional computational overhead.
视觉变压器(ViTs)由于能够捕捉全局图像表示,在各种任务中表现出色。近期研究发现,高范数令牌在 ViT 中存在,并可能干扰无监督对象的发现。为了解决这个问题,提出了使用“寄存器”的方法——这些是额外的令牌,可以隔离高范数补丁令牌同时捕获全局图像级别的信息。尽管寄存器已经在物体发现方面得到了广泛研究,但在分布外(OOD)场景中的泛化性能仍然未被充分探索。 在这篇论文中,我们探讨了寄存器令牌嵌入在提供额外特征以改善泛化和异常检测方面的效用。为此,我们提出了一种简单的方法:结合 ViT 中常用的特殊 CLS 令牌嵌入与平均池化的寄存器嵌入来创建用于训练下游分类器的特征表示。我们发现这种方法可以增强分布外(OOD)的泛化能力和异常检测的拒绝能力,并且同时保持了分布内(ID)的表现力。 在多个使用和不使用寄存器训练的 ViT 后端上进行广泛的实验表明,与没有寄存器的方法相比,在 top-1 分布外准确率方面有 2-4% 的一致改进,对于异常检测的假阳性率则减少了 2-3%。重要的是,这些收益是在不增加额外计算开销的情况下实现的。
https://arxiv.org/abs/2501.04784
Reinforcement learning (RL) in the real world necessitates the development of procedures that enable agents to explore without causing harm to themselves or others. The most successful solutions to the problem of safe RL leverage offline data to learn a safe-set, enabling safe online exploration. However, this approach to safe-learning is often constrained by the demonstrations that are available for learning. In this paper we investigate the influence of the quantity and quality of data used to train the initial safe learning problem offline on the ability to learn safe-RL policies online. Specifically, we focus on tasks with spatially extended goal states where we have few or no demonstrations available. Classically this problem is addressed either by using hand-designed controllers to generate data or by collecting user-generated demonstrations. However, these methods are often expensive and do not scale to more complex tasks and environments. To address this limitation we propose an unsupervised RL-based offline data collection procedure, to learn complex and scalable policies without the need for hand-designed controllers or user demonstrations. Our research demonstrates the significance of providing sufficient demonstrations for agents to learn optimal safe-RL policies online, and as a result, we propose optimistic forgetting, a novel online safe-RL approach that is practical for scenarios with limited data. Further, our unsupervised data collection approach highlights the need to balance diversity and optimality for safe online exploration.
在现实世界中,强化学习(RL)需要开发能够让代理安全探索而不造成自身或他人伤害的程序。迄今为止,在解决安全强化学习问题方面最成功的方案是利用离线数据来学习一个“安全集”,从而实现在线安全探索。然而,这种方法往往受限于可用于学习的安全示范的数量和质量。本文研究了用于训练初始离线安全学习问题的数据量与质量对后续在线学习安全RL策略的影响,尤其是在任务目标状态在空间上扩展而很少或没有可用演示的情况下。 传统方法通常依靠手动设计的控制器生成数据或者收集用户生成的示例来进行处理,但这两种方式往往成本高昂且难以应用于更为复杂的任务和环境。为了克服这一限制,我们提出了一种基于无监督RL的数据采集程序,用于在不依赖于手工设计控制器或用户演示的情况下学习复杂且可扩展的安全策略。 我们的研究强调了为代理提供足够的示例以在线学习最优安全强化学习政策的重要性,并因此提出了乐观遗忘法,这是一种适用于数据有限场景的新颖的在线安全强化学习方法。此外,我们提出的无监督数据采集方法还突显了在进行安全在线探索时平衡多样性和优化性的必要性。
https://arxiv.org/abs/2501.04481
Discrete tokens extracted provide efficient and domain adaptable speech features. Their application to disordered speech that exhibits articulation imprecision and large mismatch against normal voice remains unexplored. To improve their phonetic discrimination that is weakened during unsupervised K-means or vector quantization of continuous features, this paper proposes novel phone-purity guided (PPG) discrete tokens for dysarthric speech recognition. Phonetic label supervision is used to regularize maximum likelihood and reconstruction error costs used in standard K-means and VAE-VQ based discrete token extraction. Experiments conducted on the UASpeech corpus suggest that the proposed PPG discrete token features extracted from HuBERT consistently outperform hybrid TDNN and End-to-End (E2E) Conformer systems using non-PPG based K-means or VAE-VQ tokens across varying codebook sizes by statistically significant word error rate (WER) reductions up to 0.99\% and 1.77\% absolute (3.21\% and 4.82\% relative) respectively on the UASpeech test set of 16 dysarthric speakers. The lowest WER of 23.25\% was obtained by combining systems using different token features. Consistent improvements on the phone purity metric were also achieved. T-SNE visualization further demonstrates sharper decision boundaries were produced between K-means/VAE-VQ clusters after introducing phone-purity guidance.
离散令牌的提取提供了高效且领域适应性强的语音特征。尽管这些特征在处理发音不准确和与正常声音严重不符的混乱语言方面尚未得到充分研究,但本论文提出了一种新的基于音素纯度引导(PPG)的离散令牌方法,用于构音障碍语音识别中的应用。该方法通过使用音素标签监督来规范标准K-means和VAE-VQ(变分自编码器-向量量化)基线模型中使用的最大似然和重构误差成本。 在UASpeech语料库上的实验表明,与基于非PPG的K-means或VAE-VQ令牌的标准TDNN混合系统以及端到端(E2E)Conformer系统的性能相比,从HuBERT模型提取的PPG离散令牌特征在不同的码本大小下,通过统计显著性的词错误率(WER)降低实现了更好的效果。具体而言,在包含16名构音障碍者的UASpeech测试集中,与混合系统和端到端系统的基线相比,PPG令牌分别带来了最高0.99%和1.77%的绝对改进,相对改进达到了3.21%和4.82%,这些结果具有统计显著性。最低词错误率为23.25%,通过结合使用不同特征令牌系统的方法实现。 此外,在音素纯度指标上也实现了持续改进。T-SNE(t-分布随机邻域嵌入)可视化进一步证明了在引入音素纯度指导后,K-means/VAE-VQ聚类之间的决策边界变得更加清晰和分离。
https://arxiv.org/abs/2501.04379
We provide a general and malleable heuristic for the air conflict resolution problem. This heuristic is based on a new neighborhood structure for searching the solution space of trajectories and flight-levels. Using unsupervised learning, the core idea of our heuristic is to cluster the conflict points and disperse them in various flight levels. Our first algorithm is called Cluster & Disperse and in each iteration it assigns the most problematic flights in each cluster to another flight-level. In effect, we shuffle them between the flight-levels until we achieve a well-balanced configuration. The Cluster & Disperse algorithm then uses any horizontal plane conflict resolution algorithm as a subroutine to solve these well-balanced instances. Nevertheless, we develop a novel algorithm for the horizontal plane based on a similar idea. That is we cluster and disperse the conflict points spatially in the same flight level using the gradient descent and a social force. We use a novel maneuver making flights travel on an arc instead of a straight path which is based on the aviation routine of the Radius to Fix legs. Our algorithms can handle a high density of flights within a reasonable computation time. We put their performance in context with some notable algorithms from the literature. Being a general framework, a particular strength of the Cluster & Disperse is its malleability in allowing various constraints regarding the aircraft or the environment to be integrated with ease. This is in contrast to the models for instance based on mixed integer programming.
我们提供了一种针对空中冲突解决问题的通用且灵活的启发式方法。该启发式方法基于一种新的邻域结构,用于搜索轨迹和飞行高度层面的空间解决方案。通过无监督学习的核心思想是将冲突点聚类并分散到不同的飞行层中以减少冲突。 我们的第一个算法称为“Cluster & Disperse”(聚类与分散),在每次迭代中,它会将每个簇中最棘手的航班重新分配给另一个飞行层次。实际上,我们会在各个飞行层级之间重新安排这些航班,直到达到一个平衡的状态为止。随后,“Cluster & Disperse”算法使用任何水平面冲突解决算法作为子程序来处理这种平衡后的实例。 然而,我们也开发了一种基于类似思想的新方法来处理水平面上的冲突问题:即在相同的飞行高度层内通过梯度下降和社会力量机制对冲突点进行空间聚类和分散。我们设计了一个新的机动方式让航班沿着弧线而非直线路径行驶,这是依据航空惯例中的“径向至定位”段而来的。 我们的算法能够处理高密度的航班流量,并且在合理的时间内完成计算任务。我们在文献中的一些著名算法背景下评估了这些算法的表现。作为通用框架,“Cluster & Disperse”的一个特别优势在于其灵活性,它容易融入各种关于飞机或环境的不同约束条件。这与基于混合整数规划模型的情况形成了鲜明对比。
https://arxiv.org/abs/2501.04281
We present numerical and analytical results on the formation and stability of a family of fixed points of deep neural networks (DNNs). Such fixed points appear in a class of DNNs when dimensions of input and output vectors are the same. We demonstrate examples of applications of such networks in supervised, semi-supervised and unsupervised learning such as encoding/decoding of images, restoration of damaged images among others. We present several numerical and analytical results. First, we show that for untrained DNN's with weights and biases initialized by normally distributed random variables the only one fixed point exists. This result holds for DNN with any depth (number of layers) $L$, any layer width $N$, and sigmoid-type activation functions. Second, it has been shown that for a DNN whose parameters (weights and biases) are initialized by ``light-tailed'' distribution of weights (e.g. normal distribution), after training the distribution of these parameters become ``heavy-tailed''. This motivates our study of DNNs with ``heavy-tailed'' initialization. For such DNNs we show numerically %existence and stability that training leads to emergence of $Q(N,L)$ fixed points, where $Q(N,L)$ is a positive integer which depends on the number of layers $L$ and layer width $N$. We further observe numerically that for fixed $N = N_0$ the function $Q(N_0, L)$ is non-monotone, that is it initially grows as $L$ increases and then decreases to 1. This non-monotone behavior of $Q(N_0, L)$ is also obtained by analytical derivation of equation for Empirical Spectral Distribution (ESD) of input-output Jacobian followed by numerical solution of this equation.
我们介绍了关于深度神经网络(DNN)的一组固定点的形成和稳定性的数值及分析结果。这些固定点出现在输入向量与输出向量维度相同的DNN类中。展示了这种类型的网络在监督学习、半监督学习以及无监督学习中的应用实例,例如图像编码/解码、受损图像恢复等。 我们呈现了几个数值和理论上的结果。首先,我们证明对于权重和偏置由正态分布随机变量初始化的未经训练DNN来说,仅存在一个固定点。这一结论适用于任何深度(层数)$L$、任何层宽$N$以及采用S型激活函数的DNN。 其次,研究表明:对于其参数(权重和偏置)以“轻尾”分布(如正态分布)初始化的DNN,在经过训练后这些参数的分布会变成“重尾”。这激发了我们对使用“重尾”初始化的DNN的研究。对于这样的DNN,我们通过数值方法证明:训练会导致$Q(N,L)$个固定点的出现,其中$Q(N,L)$是一个依赖于层数$L$和层宽$N$的正整数。 此外,我们还观察到当固定$N = N_0$时,函数$Q(N_0, L)$是非单调的:在$L$增加初期它增长随后减少至1。这种非单调行为也通过推导输入-输出雅可比矩阵的经验谱分布(ESD)方程,并进而求解该方程来获得。
https://arxiv.org/abs/2501.04182
Nanoparticle superlattices consisting of ordered arrangements of nanoparticles exhibit unique optical, magnetic, and electronic properties arising from nanoparticle characteristics as well as their collective behaviors. Understanding how processing conditions influence the nanoscale arrangement and microstructure is critical for engineering materials with desired macroscopic properties. Microstructural features such as grain boundaries, lattice defects, and pores significantly affect these properties but are challenging to quantify using traditional manual analyses as they are labor-intensive and prone to errors. In this work, we present a machine learning workflow for automating grain segmentation in scanning electron microscopy (SEM) images of nanoparticle superlattices. This workflow integrates signal processing techniques, such as Radon transforms, with unsupervised learning methods like agglomerative hierarchical clustering to identify and segment grains without requiring manually annotated data. In the workflow we transform the raw pixel data into explainable numerical representation of superlattice orientations for clustering. Benchmarking results demonstrate the workflow's robustness against noisy images and edge cases, with a processing speed of four images per minute on standard computational hardware. This efficiency makes the workflow scalable to large datasets and makes it a valuable tool for integrating data-driven models into decision-making processes for material design and analysis. For example, one can use this workflow to quantify grain size distributions at varying processing conditions like temperature and pressure and using that knowledge adjust processing conditions to achieve desired superlattice orientations and grain sizes.
由有序排列的纳米颗粒组成的纳米粒子超晶格展现出独特的光学、磁性和电子特性,这些特性源自于纳米颗粒本身的特性和它们集体行为的结果。理解加工条件如何影响纳米级排列和微观结构对于设计具有所需宏观性质的材料至关重要。微结构特征(如晶界、晶格缺陷及孔隙)显著地影响这些属性,但由于传统手动分析方法耗时且容易出错,因此量化这些特征颇具挑战性。 在这项工作中,我们提出了一种机器学习工作流程,用于自动分割扫描电子显微镜(SEM)图像中的纳米粒子超晶格的晶粒。该工作流程结合了信号处理技术(如Radon变换)与无监督学习方法(例如凝聚层次聚类),能够在不需要人工标注数据的情况下识别并分割晶粒。在我们的工作流程中,我们将原始像素数据转换成解释性数值表示,以描述超晶格取向进行聚类。 基准测试结果表明该工作流程能够应对含噪图像和边界情况的挑战,并且在标准计算硬件上每分钟可以处理四张图片的速度运行稳定。这种效率使得该工作流程适用于大规模的数据集,并成为将数据驱动模型整合到材料设计与分析决策过程中的宝贵工具。例如,人们可以使用此工作流程量化不同加工条件(如温度和压力)下的晶粒尺寸分布,并据此调整加工条件以实现所需超晶格取向及晶粒大小。
https://arxiv.org/abs/2501.04172
In this paper, we introduce an unsupervised approach for Speech Segmentation, which builds on previously researched approaches, e.g., Speaker Diarization, while being applicable to an inclusive set of acoustic-semantic distinctions, paving a path towards a general Unsupervised Speech Segmentation approach. Unlike traditional speech and audio segmentation, which mainly focuses on spectral changes in the input signal, e.g., phone segmentation, our approach tries to segment the spoken utterance into chunks with differing acoustic-semantic styles, focusing on acoustic-semantic information that does not translate well into text, e.g., emotion or speaker. While most Speech Segmentation tasks only handle one style change, e.g., emotion diarization, our approach tries to handle multiple acoustic-semantic style changes. Leveraging recent advances in Speech Language Models (SLMs), we propose a simple unsupervised method to segment a given speech utterance. We empirically demonstrate the effectiveness of the proposed approach by considering several setups. Results suggest that the proposed method is superior to the evaluated baselines on boundary detection, segment purity, and over-segmentation. Code is available at this https URL.
在这篇论文中,我们提出了一种无监督的语音分割方法,该方法建立在先前研究的方法(如说话人识别)的基础上,并适用于广泛的声学-语义区别,从而为通用的无监督语音分割方法铺平了道路。与传统的语音和音频分割主要关注输入信号中的频谱变化(例如,音素划分)不同,我们的方法试图将口语内容划分为具有不同声学-语义风格的片段,并专注于那些难以转化为文本的信息,例如情感或说话人的身份。大多数语音分割任务仅处理一种风格的变化,例如情感记录,而我们提出的方法旨在处理多种声学-语义风格变化。 通过利用最近在语音语言模型(SLM)方面的进展,我们提出了一种简单无监督的分割方法来对给定的口语内容进行划分。我们通过对几个不同设置进行实证研究,证明了所提议方法的有效性。结果表明,在边界检测、片段纯净度和过度分段方面,我们的方法优于评估中的基准方法。 代码可在以下网址获得:[此 URL]
https://arxiv.org/abs/2501.03711
The emergence of virtual staining technology provides a rapid and efficient alternative for researchers in tissue pathology. It enables the utilization of unlabeled microscopic samples to generate virtual replicas of chemically stained histological slices, or facilitate the transformation of one staining type into another. The remarkable performance of generative networks, such as CycleGAN, offers an unsupervised learning approach for virtual coloring, overcoming the limitations of high-quality paired data required in supervised learning. Nevertheless, large-scale color transformation necessitates processing large field-of-view images in patches, often resulting in significant boundary inconsistency and artifacts. Additionally, the transformation between different colorized modalities typically needs further efforts to modify loss functions and tune hyperparameters for independent training of networks. In this study, we introduce a general virtual staining framework that is adaptable to various conditions. We propose a loss function based on the value mapping constraint to ensure the accuracy of virtual coloring between different pathological modalities, termed the Value Mapping Generative Adversarial Network (VM-GAN). Meanwhile, we present a confidence-based tiling method to address the challenge of boundary inconsistency arising from patch-wise processing. Experimental results on diverse data with varying staining protocols demonstrate that our method achieves superior quantitative indicators and improved visual perception.
虚拟染色技术的出现为组织病理学研究人员提供了一种快速且高效的替代方案。这项技术允许利用未标记的显微镜样本生成化学染色后的组织切片的虚拟副本,或促进一种染色类型向另一种类型的转换。生成网络(如CycleGAN)的卓越性能提供了无监督学习方法来进行虚拟着色,克服了监督学习所需的高质量配对数据限制。然而,大规模的颜色变换需要处理大片视野的图像片段,这往往会导致显著的边界不一致和伪影问题。此外,在不同颜色化模态之间进行转换通常还需要进一步的努力来修改损失函数并调整超参数以独立训练网络。 在本研究中,我们引入了一个适用于各种条件的一般虚拟染色框架。我们提出了一种基于值映射约束的损失函数,用以确保不同病理学模式之间的虚拟着色准确性,并将其命名为值映射生成对抗网络(VM-GAN)。同时,我们还提供了一种基于置信度的拼接方法来解决由分块处理引起的边界不一致问题。在采用不同染色协议的各种数据集上进行的实验结果表明,我们的方法实现了更优的定量指标和改进的视觉感知效果。
https://arxiv.org/abs/2501.03592
In this paper, we propose ProTracker, a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. The key idea of our method is incorporating probabilistic integration to refine multiple predictions from both optical flow and semantic features for robust short-term and long-term tracking. Specifically, we integrate optical flow estimations in a probabilistic manner, producing smooth and accurate trajectories by maximizing the likelihood of each prediction. To effectively re-localize challenging points that disappear and reappear due to occlusion, we further incorporate long-term feature correspondence into our flow predictions for continuous trajectory generation. Extensive experiments show that ProTracker achieves the state-of-the-art performance among unsupervised and self-supervised approaches, and even outperforms supervised methods on several benchmarks. Our code and model will be publicly available upon publication.
在这篇论文中,我们提出了一种名为ProTracker的新型框架,用于在视频中对任意点进行稳健且准确的长期密集跟踪。我们的方法的核心思想是通过结合概率集成来优化来自光流和语义特征的多个预测结果,从而实现短期和长期内的稳健跟踪。具体来说,我们将光流估计以概率的方式整合起来,在最大化每个预测可能性的同时生成平滑而精确的轨迹。为了有效地重新定位由于遮挡而消失又重新出现的具有挑战性的点,我们进一步在我们的光流预测中引入了长期特征对应关系,从而实现连续轨迹的生成。广泛的实验表明,ProTracker在无监督和自监督方法中的性能处于行业领先水平,并且甚至在多个基准测试上超越了有监督的方法。论文发布后,我们的代码和模型将公开提供。
https://arxiv.org/abs/2501.03220
Diffusion models have demonstrated their utility as learned priors for solving various inverse problems. However, most existing approaches are limited to linear inverse problems. This paper exploits the efficient and unsupervised posterior sampling framework of Denoising Diffusion Restoration Models (DDRM) for the solution of nonlinear phase retrieval problem, which requires reconstructing an image from its noisy intensity-only measurements such as Fourier intensity. The approach combines the model-based alternating-projection methods with the DDRM to utilize pretrained unconditional diffusion priors for phase retrieval. The performance is demonstrated through both simulations and experimental data. Results demonstrate the potential of this approach for improving the alternating-projection methods as well as its limitations.
扩散模型已经展示了其作为解决各种逆向问题的先验知识的有效性。然而,现有的大多数方法仅限于线性逆向问题。本文利用了去噪扩散恢复模型(Denoising Diffusion Restoration Models, DDRM)高效的无监督后验采样框架来解决非线性的相位检索问题。该问题要求从仅有强度信息的噪声测量值中重建图像,例如傅里叶强度。研究方法结合了基于模型的交替投影法与DDRM,利用预先训练好的无条件扩散先验进行相位检索。通过模拟和实验数据展示了这种方法的效果。结果表明,此方法有潜力改进交替投影方法,并指出了其局限性。
https://arxiv.org/abs/2501.03030
Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data and has been extensively studied and used in a variety of practical tasks. However, most unsupervised outlier detection methods are carefully designed to detect specified outliers, while real-world data may be entangled with different types of outliers. In this study, we propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers. Specifically, a novel fuzzy rough sets-based method that integrates relative fuzzy granule density is first introduced to improve the capability of detecting local outliers. Then, a multi-scale view generation method based on granular-ball computing is proposed to collaboratively identify group outliers at different levels of granularity. Moreover, reliable outliers and inliers determined by the three-way decision are used to train a weighted support vector machine to further improve the performance of outlier detection. The proposed method innovatively transforms unsupervised outlier detection into a semi-supervised classification problem and for the first time explores the fuzzy rough sets-based outlier detection from the perspective of multi-scale granular balls, allowing for high adaptability to different types of outliers. Extensive experiments carried out on both artificial and UCI datasets demonstrate that the proposed outlier detection method significantly outperforms the state-of-the-art methods, improving the results by at least 8.48% in terms of the Area Under the ROC Curve (AUROC) index. { The source codes are released at \url{this https URL}. }
异常检测指的是识别与正常数据分布显著偏离的离群样本,在各种实际任务中得到了广泛的研究和应用。然而,大多数无监督异常检测方法都是精心设计来探测特定类型的异常值,而现实世界的数据可能包含多种不同类型的异常值。在本研究中,我们提出了一种基于模糊粗糙集的多尺度异常检测方法,以识别不同类型的不同层次上的离群样本。 具体来说,首先引入了一种新颖的方法——基于相对模糊粒度密度的模糊粗糙集方法,用于改进局部离群点的检测能力。然后,提出了一个基于粒球计算的多尺度视角生成方法,协同地在不同粒度级别上识别群体异常值。此外,利用三路决策确定的可靠异常值和非异常值来训练加权支持向量机(SVM),进一步提高异常检测性能。 该方法创新性地将无监督异常检测转化为半监督分类问题,并首次从多尺度粒球的角度探索了基于模糊粗糙集的离群点检测,使其具有高度适应不同类型异常的能力。在人工数据和UCI数据集上进行的广泛实验表明,所提出的异常检测方法显著优于最新的方法,在ROC曲线下面积(AUROC)指数方面至少提高了8.48%。 {源代码可在[此链接](https://this https URL)获得。}
https://arxiv.org/abs/2501.02975
Tomato anomalies/damages pose a significant challenge in greenhouse farming. While this method of cultivation benefits from efficient resource utilization, anomalies can significantly degrade the quality of farm produce. A common anomaly associated with tomatoes is splitting, characterized by the development of cracks on the tomato skin, which degrades its quality. Detecting this type of anomaly is challenging due to dynamic variations in appearance and sizes, compounded by dataset scarcity. We address this problem in an unsupervised manner by utilizing a tailored variational autoencoder (VAE) with hyperspectral input. Preliminary analysis of the dataset enabled us to select the optimal range of wavelengths for detecting this anomaly. Our findings indicate that the 530nm - 550nm range is suitable for identifying tomato dry splits. The analysis on reconstruction loss allow us to not only detect the anomalies but also to some degree estimate the anomalous regions.
番茄异常和损伤在温室种植中是一个重大挑战。尽管这种栽培方法能够高效利用资源,但出现的异常情况会严重影响农作物的质量。西红柿常见的异常之一是裂开现象,即西红柿表皮上出现裂缝,这会导致其品质下降。由于外观变化多样且样本数据稀缺,检测这类异常非常具有挑战性。 我们通过使用一种专门针对高光谱输入设计的变分自编码器(VAE),以无监督学习的方式解决了这个问题。对初始数据集进行分析后,我们选定了一个能够有效识别干裂的最佳波长范围:530纳米至550纳米之间。进一步的重建误差分析不仅帮助我们检测到了这些异常现象,还能够在一定程度上估计出异常区域的具体位置。 这项研究的结果为在温室环境中高效地监测和管理西红柿品质提供了一种新的可能性。
https://arxiv.org/abs/2501.02921
Recent successes in self-supervised learning (SSL) model spatial co-occurrences of visual features either by masking portions of an image or by aggressively cropping it. Here, we propose a new way to model spatial co-occurrences by aligning local representations (before pooling) with a global image representation. We present CO-SSL, a family of instance discrimination methods and show that it outperforms previous methods on several datasets, including ImageNet-1K where it achieves 71.5% of Top-1 accuracy with 100 pre-training epochs. CO-SSL is also more robust to noise corruption, internal corruption, small adversarial attacks, and large training crop sizes. Our analysis further indicates that CO-SSL learns highly redundant local representations, which offers an explanation for its robustness. Overall, our work suggests that aligning local and global representations may be a powerful principle of unsupervised category learning.
最近在自监督学习(SSL)模型中的成功案例是通过屏蔽图像的部分或激进地裁剪图像来建模视觉特征的空间共现。在这里,我们提出了一种新的方法,通过将局部表示(在池化之前)与全局图像表示对齐来建模空间共现。我们介绍了CO-SSL,这是一个实例区分方法家族,并展示了它在多个数据集上超越了先前的方法,包括ImageNet-1K,在该数据集中使用100个预训练周期达到了71.5%的Top-1准确率。CO-SSL还对噪声污染、内部污染、小规模对抗性攻击以及大规模训练裁剪尺寸更加鲁棒。我们的分析进一步表明,CO-SSL学习到了高度冗余的局部表示,这为其鲁棒性提供了解释。总体而言,我们的工作表明将局部和全局表示对齐可能是无监督类别学习的一个强大原则。
https://arxiv.org/abs/2501.02860
Occlusions are a significant challenge to human pose estimation algorithms, often resulting in inaccurate and anatomically implausible poses. Although current occlusion-robust human pose estimation algorithms exhibit impressive performance on existing datasets, their success is largely attributed to supervised training and the availability of additional information, such as multiple views or temporal continuity. Furthermore, these algorithms typically suffer from performance degradation under distribution shifts. While existing domain adaptive human pose estimation algorithms address this bottleneck, they tend to perform suboptimally when the target domain images are occluded, a common occurrence in real-life scenarios. To address these challenges, we propose OR-POSE: Unsupervised Domain Adaptation for Occlusion Resilient Human POSE Estimation. OR-POSE is an innovative unsupervised domain adaptation algorithm which effectively mitigates domain shifts and overcomes occlusion challenges by employing the mean teacher framework for iterative pseudo-label refinement. Additionally, OR-POSE reinforces realistic pose prediction by leveraging a learned human pose prior which incorporates the anatomical constraints of humans in the adaptation process. Lastly, OR-POSE avoids overfitting to inaccurate pseudo labels generated from heavily occluded images by employing a novel visibility-based curriculum learning approach. This enables the model to gradually transition from training samples with relatively less occlusion to more challenging, heavily occluded samples. Extensive experiments show that OR-POSE outperforms existing analogous state-of-the-art algorithms by $\sim$ 7% on challenging occluded human pose estimation datasets.
遮挡是人体姿态估计算法面临的一个重大挑战,通常会导致不准确且不符合解剖学的姿势结果。尽管当前针对遮挡鲁棒的人体姿态估计算法在现有数据集上表现优异,但它们的成功很大程度上依赖于监督训练和额外信息(如多视角或时间连续性)的存在。此外,在分布变化的情况下,这些算法通常会表现出性能下降的问题。虽然现有的领域适应型人体姿态估计算法解决了这一瓶颈问题,但在目标域图像受到遮挡时,其表现往往不尽人意,而这种情况在现实场景中极为常见。为了解决这些问题,我们提出了OR-POSE:无监督领域的自适应算法用于抗遮挡的人体姿态估计。 OR-POSE是一种创新的无监督领域适应算法,它通过采用均值教师框架进行迭代伪标签优化来有效缓解分布变化,并克服遮挡挑战。此外,OR-POSE利用学习到的人体姿势先验(包含人体解剖学约束)增强现实场景下的姿态预测准确性。最后,为了防止模型过度拟合由严重遮挡图像生成的不准确伪标签,OR-POSE采用了基于可见性的新课程学习方法。这使得模型能够逐步从相对较少遮挡的训练样本过渡到更具挑战性、严重遮挡的样本。 通过广泛的实验验证,在具有挑战性的遮挡人体姿态估计数据集上,OR-POSE相较于现有最先进的算法性能提升了约7%。
https://arxiv.org/abs/2501.02773
Word sense disambiguation (WSD) is one of the main challenges in Computational Linguistics. TreeMatch is a WSD system originally developed using data from SemEval 2007 Task 7 (Coarse-grained English All-words Task) that has been adapted for use in SemEval 2010 Task 17 (All-words Word Sense Disambiguation on a Specific Domain). The system is based on a fully unsupervised method using dependency knowledge drawn from a domain specific knowledge base that was built for this task. When evaluated on the task, the system precision performs above the Most Frequent Selection baseline.
词义消歧(Word Sense Disambiguation,WSD)是计算语言学中的一项主要挑战。TreeMatch 是一个最初基于 SemEval 2007 年度任务 7(粗粒度英语全词任务)数据开发的 WSD 系统,并已被修改以用于 SemEval 2010 年度任务 17(特定领域的全词词义消歧)。该系统采用了一种完全无监督的方法,利用了从特定领域知识库中提取的依存关系信息。在对任务进行评估时,系统的精确率超过了最频繁选择基线模型的表现。
https://arxiv.org/abs/2501.02546
This article investigates the critical issue of dataset bias in medical imaging, with a particular emphasis on racial disparities caused by uneven population distribution in dataset collection. Our analysis reveals that medical segmentation datasets are significantly biased, primarily influenced by the demographic composition of their collection sites. For instance, Scanning Laser Ophthalmoscopy (SLO) fundus datasets collected in the United States predominantly feature images of White individuals, with minority racial groups underrepresented. This imbalance can result in biased model performance and inequitable clinical outcomes, particularly for minority populations. To address this challenge, we propose a novel training set search strategy aimed at reducing these biases by focusing on underrepresented racial groups. Our approach utilizes existing datasets and employs a simple greedy algorithm to identify source images that closely match the target domain distribution. By selecting training data that aligns more closely with the characteristics of minority populations, our strategy improves the accuracy of medical segmentation models on specific minorities, i.e., Black. Our experimental results demonstrate the effectiveness of this approach in mitigating bias. We also discuss the broader societal implications, highlighting how addressing these disparities can contribute to more equitable healthcare outcomes.
本文探讨了医学成像数据集中偏见的关键问题,特别关注由于样本收集过程中人口分布不均而导致的种族差异。我们的分析表明,医学分割数据集存在显著偏差,主要受其采集地点的人口构成影响。例如,在美国收集的扫描激光眼底成像(SLO)视网膜数据集中,白人图像占主导地位,而少数族裔群体代表性不足。这种不平衡可能导致模型性能偏斜,并对少数族裔造成不公平的临床结果。为应对这一挑战,我们提出了一种新的训练集搜索策略,旨在通过专注于未充分代表的种族来减少这些偏差。我们的方法利用现有数据集,并采用一种简单的贪婪算法来识别与目标领域分布接近的源图像。通过选择更符合少数群体特征的训练数据,我们的策略能够提高医学分割模型在特定少数族裔(如黑人)上的准确性。实验结果表明,这种方法有效减轻了偏见问题。我们还讨论了这一方法的社会影响,强调解决这些差异如何有助于实现更加公平的医疗保健成果。
https://arxiv.org/abs/2501.02442
Semantic segmentation is a computer vision task where classification is performed at a pixel level. Due to this, the process of labeling images for semantic segmentation is time-consuming and expensive. To mitigate this cost there has been a surge in the use of synthetically generated data -- usually created using simulators or videogames -- which, in combination with domain adaptation methods, can effectively learn how to segment real data. Still, these datasets have a particular limitation: due to their closed-set nature, it is not possible to include novel classes without modifying the tool used to generate them, which is often not public. Concurrently, generative models have made remarkable progress, particularly with the introduction of diffusion models, enabling the creation of high-quality images from text prompts without additional supervision. In this work, we propose an unsupervised pipeline that leverages Stable Diffusion and Segment Anything Module to generate class examples with an associated segmentation mask, and a method to integrate generated cutouts for novel classes in semantic segmentation datasets, all with minimal user input. Our approach aims to improve the performance of unsupervised domain adaptation methods by introducing novel samples into the training data without modifications to the underlying algorithms. With our methods, we show how models can not only effectively learn how to segment novel classes, with an average performance of 51% IoU, but also reduce errors for other, already existing classes, reaching a higher performance level overall.
语义分割是一种计算机视觉任务,其目标是在像素级别上进行分类。由于这一特点,为语义分割标注图像的过程既耗时又昂贵。为了降低这种成本,人们开始大量使用合成生成的数据——通常是通过模拟器或电子游戏创建的——这些数据与领域适应方法相结合,可以有效地学习如何对真实数据进行分割。然而,这类数据集有一个特定的局限性:由于它们是封闭式的,无法在不修改用于生成它们的工具的情况下引入新的类别,而这些工具往往不是公开可用的。 与此同时,生成模型取得了显著进步,特别是在引入扩散模型后,可以从文本提示中生成高质量图像,并且无需额外监督。在这项工作中,我们提出了一种无监督流水线,该流水线利用Stable Diffusion和Segment Anything Module来生成带有相关分割掩码的类别示例,并提供一种方法将生成的剪切图整合到语义分割数据集中,同时只需最少的人工干预。我们的方法旨在通过向训练数据中引入新样本而不修改底层算法的方式来改进无监督领域适应的方法。 使用我们提出的方法,我们展示了模型不仅可以有效地学习如何对新的类别进行分割(平均性能为51% IoU),而且还可以减少现有类别的错误率,从而整体上达到更高的性能水平。
https://arxiv.org/abs/2501.02264
Quantum annealing has garnered significant attention as meta-heuristics inspired by quantum physics for combinatorial optimization problems. Among its many applications, nonnegative/binary matrix factorization stands out for its complexity and relevance in unsupervised machine learning. The use of reverse annealing, a derivative procedure of quantum annealing to prioritize the search in a vicinity under a given initial state, helps improve its optimization performance in matrix factorization. This study proposes an improved strategy that integrates reverse annealing with a linear programming relaxation technique. Using relaxed solutions as the initial configuration for reverse annealing, we demonstrate improvements in optimization performance comparable to the exact optimization methods. Our experiments on facial image datasets show that our method provides better convergence than known reverse annealing methods. Furthermore, we investigate the effectiveness of relaxation-based initialization methods on randomized datasets, demonstrating a relationship between the relaxed solution and the optimal solution. This research underscores the potential of combining reverse annealing and classical optimization strategies to enhance optimization performance.
量子退火作为一种受量子物理启发的元启发式算法,在组合优化问题中引起了广泛关注。在众多应用领域中,非负/二进制矩阵分解因其复杂性和在无监督机器学习中的重要性而尤为突出。反向退火是一种基于量子退火的技术,通过优先搜索给定初始状态附近的区域来提高优化性能,特别是在矩阵因子化方面表现优异。 本研究提出了一种改进策略,该策略将反向退火与线性规划松弛技术相结合。我们将松弛解作为反向退火的初始配置,从而展示了在优化性能上能够达到接近精确优化方法的效果。通过对面部图像数据集进行实验,我们发现相较于已知的反向退火方法,我们的方法提供了更好的收敛效果。 此外,我们在随机化数据集上研究了基于松弛初始化方法的有效性,并展示了解的松弛解与最优解之间的关系。这项研究表明结合反向退火和经典优化策略具有增强优化性能的巨大潜力。
https://arxiv.org/abs/2501.02114
Water Distribution Networks (WDNs) are vital infrastructures, and contamination poses serious public health risks. Harmful substances can interact with disinfectants like chlorine, making chlorine monitoring essential for detecting contaminants. However, chlorine sensors often become unreliable and require frequent calibration. This study introduces the Dual-Threshold Anomaly and Drift Detection (AD&DD) method, an unsupervised approach combining a dual-threshold drift detection mechanism with an LSTM-based Variational Autoencoder(LSTM-VAE) for real-time contamination detection. Tested on two realistic WDNs, AD&DD effectively identifies anomalies with sensor offsets as concept drift, and outperforms other methods. A proposed decentralized architecture enables accurate contamination detection and localization by deploying AD&DD on selected nodes.
水分配网络(WDNs)是至关重要的基础设施,污染会带来严重的公共卫生风险。有害物质可能会与氯等消毒剂发生反应,因此监测氯含量对于检测污染物至关重要。然而,氯传感器常常变得不可靠,并且需要频繁校准。这项研究引入了一种名为“双重阈值异常和漂移检测”(AD&DD)的方法,这是一种结合了双阈值漂移检测机制与基于LSTM的变分自编码器(LSTM-VAE),用于实时污染检测的无监督方法。该方法在两个真实的WDNs上进行了测试,并能够有效地识别传感器偏置引起的异常作为概念漂移,并且优于其他方法。此外,还提出了一种去中心化架构,通过在选定节点上部署AD&DD来实现准确的污染检测和定位。
https://arxiv.org/abs/2501.02107
Objective: To propose and validate an unsupervised MRI reconstruction method that does not require fully sampled k-space data. Materials and Methods: The proposed method, deep image prior with structured sparsity (DISCUS), extends the deep image prior (DIP) by introducing group sparsity to frame-specific code vectors, enabling the discovery of a low-dimensional manifold for capturing temporal variations. \discus was validated using four studies: (I) simulation of a dynamic Shepp-Logan phantom to demonstrate its manifold discovery capabilities, (II) comparison with compressed sensing and DIP-based methods using simulated single-shot late gadolinium enhancement (LGE) image series from six distinct digital cardiac phantoms in terms of normalized mean square error (NMSE) and structural similarity index measure (SSIM), (III) evaluation on retrospectively undersampled single-shot LGE data from eight patients, and (IV) evaluation on prospectively undersampled single-shot LGE data from eight patients, assessed via blind scoring from two expert readers. Results: DISCUS outperformed competing methods, demonstrating superior reconstruction quality in terms of NMSE and SSIM (Studies I--III) and expert reader scoring (Study IV). Discussion: An unsupervised image reconstruction method is presented and validated on simulated and measured data. These developments can benefit applications where acquiring fully sampled data is challenging.
目标:提出并验证一种无需完全采样k空间数据的无监督MRI重建方法。材料和方法:所提出的DISCUS(带有结构化稀疏性的深度图像先验)方法扩展了深度图像先验(DIP),通过引入组稀疏性到帧特定代码向量中,使能够发现捕捉时间变化的低维流形。该方法通过以下四个研究进行了验证: (I) 使用动态Shepp-Logan体模的仿真,以展示其在发现流形方面的能力; (II) 利用来自六种不同数字心脏体模的模拟单次激发延迟钆增强(LGE)图像系列,与压缩感知和基于DIP的方法进行比较,在归一化均方误差(NMSE)和结构相似性指数(SSIM)上进行评估; (III) 使用八名患者的回顾性欠采样单次激发LGE数据进行评价; (IV) 使用八名患者前瞻性欠采样单次激发LGE数据,并通过两位专家的盲评分来评价。 结果:DISCUS方法在NMSE和SSIM (研究I-III)以及专家评分(研究IV)方面优于其他竞争方法,显示了更高的重建质量。 讨论:本文提出了一种无监督图像重建方法,并在模拟和实际测量的数据上进行了验证。这些开发成果有助于解决那些难以获取完全采样数据的应用场景。
https://arxiv.org/abs/2501.01482