Image edge detection (ED) faces a fundamental mismatch between training and inference: models are trained using continuous-valued outputs but evaluated using binary predictions. This misalignment, caused by the non-differentiability of binarization, weakens the link between learning objectives and actual task performance. In this paper, we propose a theoretical method to design a Binarization-Aware Adjuster (BAA), which explicitly incorporates binarization behavior into gradient-based optimization. At the core of BAA is a novel loss adjustment mechanism based on a Distance Weight Function (DWF), which reweights pixel-wise contributions according to their correctness and proximity to the decision boundary. This emphasizes decision-critical regions while down-weighting less influential ones. We also introduce a self-adaptive procedure to estimate the optimal binarization threshold for BAA, further aligning training dynamics with inference behavior. Extensive experiments across various architectures and datasets demonstrate the effectiveness of our approach. Beyond ED, BAA offers a generalizable strategy for bridging the gap between continuous optimization and discrete evaluation in structured prediction tasks.
图像边缘检测(ED)在训练和推理之间存在根本性的不匹配:模型是使用连续值输出进行训练的,但在评估时却采用二元预测。这种不对齐是由二值化过程的非可微性所导致的,这削弱了学习目标与实际任务性能之间的联系。在这篇论文中,我们提出了一种理论方法来设计一种二值化感知调整器(BAA),该方法明确地将二值化行为整合到基于梯度的优化过程中。BAA的核心是一个新颖的损失调整机制,它基于距离权重函数(DWF)重新加权像素级别的贡献,根据其正确性和接近决策边界的程度来进行。 这种方法强调了在关键决策区域中的重要性,并降低了对影响较小部分的权重。我们还引入了一个自适应过程来估计BAA的最佳二值化阈值,进一步将训练动态与推理行为相协调。通过多种架构和数据集上的广泛实验,证明了我们的方法的有效性。除了ED之外,BAA为在结构化预测任务中弥合连续优化与离散评估之间的差距提供了一种通用策略。
https://arxiv.org/abs/2506.12460
The adoption of neural network models in medical imaging has been constrained by strict privacy regulations, limited data availability, high acquisition costs, and demographic biases. Deep generative models offer a promising solution by generating synthetic data that bypasses privacy concerns and addresses fairness by producing samples for under-represented groups. However, unlike natural images, medical imaging requires validation not only for fidelity (e.g., Fréchet Inception Score) but also for morphological and clinical accuracy. This is particularly true for colour fundus retinal imaging, which requires precise replication of the retinal vascular network, including vessel topology, continuity, and thickness. In this study, we in-vestigated whether a distance-based loss function based on deep activation layers of a large foundational model trained on large corpus of domain data, colour fundus imaging, offers advantages over a perceptual loss and edge-detection based loss functions. Our extensive validation pipeline, based on both domain-free and domain specific tasks, suggests that domain-specific deep features do not improve autoen-coder image generation. Conversely, our findings highlight the effectiveness of con-ventional edge detection filters in improving the sharpness of vascular structures in synthetic samples.
在医学影像领域,神经网络模型的采用受到严格隐私法规、数据稀缺性、高昂获取成本以及人口统计学偏差的限制。深度生成模型通过生成合成数据来解决这些问题,这些数据可以绕过隐私问题,并为代表性不足的人群提供样本,从而促进公平性。然而,与自然图像不同,医学影像需要进行验证以确保其保真度(如Fréchet Inception Score)、形态学和临床准确性。特别是对于彩色眼底视网膜成像而言,精确复制包括血管拓扑结构、连续性和厚度在内的视网膜血管网络尤为重要。 在这项研究中,我们调查了一种基于大规模基础模型训练的大规模领域数据深层激活层的距离损失函数是否比感知损失和边缘检测损失函数具有优势。我们的验证管道涵盖了无域特异性和有特定领域的任务,结果表明专门的深度特征并不能改进自编码器图像生成的效果。相反,我们的研究强调了传统边缘检测滤波器在提高合成样本中血管结构清晰度方面的有效性。
https://arxiv.org/abs/2506.11753
Image segmentation is a fundamental task in computer vision aimed at delineating object boundaries within images. Traditional approaches, such as edge detection and variational methods, have been widely explored, while recent advances in deep learning have shown promising results but often require extensive training data. In this work, we propose a novel variational framework for 2D image segmentation that integrates concepts from shape analysis and diffeomorphic transformations. Our method models segmentation as the deformation of a template curve via a diffeomorphic transformation of the image domain, using the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework. The curve evolution is guided by a loss function that compares the deformed curve to the image gradient field, formulated through the varifold representation of geometric shapes. The approach is implemented in Python with GPU acceleration using the PyKeops library. This framework allows for accurate segmentation with a flexible and theoretically grounded methodology that does not rely on large datasets.
https://arxiv.org/abs/2506.09357
This paper proposes a tropical geometry-based edge detection framework that reformulates convolution and gradient computations using min-plus and max-plus algebra. The tropical formulation emphasizes dominant intensity variations, contributing to sharper and more continuous edge representations. Three variants are explored: an adaptive threshold-based method, a multi-kernel min-plus method, and a max-plus method emphasizing structural continuity. The framework integrates multi-scale processing, Hessian filtering, and wavelet shrinkage to enhance edge transitions while maintaining computational efficiency. Experiments on MATLAB built-in grayscale and color images suggest that tropical formulations integrated with classical operators, such as Canny and LoG, can improve boundary detection in low-contrast and textured regions. Quantitative evaluation using standard edge metrics indicates favorable edge clarity and structural coherence. These results highlight the potential of tropical algebra as a scalable and noise-aware formulation for edge detection in practical image analysis tasks.
本文提出了一种基于热带几何的边缘检测框架,该框架使用极小-极大加法代数(min-plus和max-plus)重新表述卷积和梯度计算。这种热带公式化强调了主要强度变化,有助于获得更清晰、更连续的边缘表示。文中探讨了三种变体:自适应阈值方法、多核极小-极大加法方法以及着重于结构连续性的极大-加法方法。该框架集成了多尺度处理、赫斯(Hessian)滤波和小波收缩,以增强边缘过渡并保持计算效率。在MATLAB内置的灰度和彩色图像上进行的实验表明,与经典算子如Canny和LoG集成的传统边界的热带公式化方法可以在低对比度和纹理区域提高边界检测能力。使用标准边缘指标进行的定量评估显示了有利的边缘清晰度和结构连贯性。这些结果突显了热带代数作为边缘检测实用图像分析任务中可扩展且对噪声敏感的表述方案的潜力。
https://arxiv.org/abs/2505.18625
This study addresses the inherent limitations of Multi-Layer Perceptrons (MLPs) in Vision Transformers (ViTs) by introducing Hybrid Kolmogorov-Arnold Network (KAN)-ViT (Hyb-KAN ViT), a novel framework that integrates wavelet-based spectral decomposition and spline-optimized activation functions, prior work has failed to focus on the prebuilt modularity of the ViT architecture and integration of edge detection capabilities of Wavelet functions. We propose two key modules: Efficient-KAN (Eff-KAN), which replaces MLP layers with spline functions and Wavelet-KAN (Wav-KAN), leveraging orthogonal wavelet transforms for multi-resolution feature extraction. These modules are systematically integrated in ViT encoder layers and classification heads to enhance spatial-frequency modeling while mitigating computational bottlenecks. Experiments on ImageNet-1K (Image Recognition), COCO (Object Detection and Instance Segmentation), and ADE20K (Semantic Segmentation) demonstrate state-of-the-art performance with Hyb-KAN ViT. Ablation studies validate the efficacy of wavelet-driven spectral priors in segmentation and spline-based efficiency in detection tasks. The framework establishes a new paradigm for balancing parameter efficiency and multi-scale representation in vision architectures.
这项研究通过引入混合Kolmogorov-Arnold网络(KAN)-视觉变换器(Hyb-KAN ViT),一种新的框架,解决了视觉变换器中多层感知机(MLPs)的内在局限性。该框架结合了基于小波的频谱分解和样条优化激活函数。先前的研究未能关注ViT架构的预构建模块化以及小波功能在边缘检测能力上的整合。我们提出了两个关键模块:高效KAN(Eff-KAN),用样条函数替换MLP层,以及Wavelet KAN(Wav-KAN),利用正交小波变换进行多分辨率特征提取。这些模块系统地集成到ViT编码器层和分类头中,以增强空间-频率建模的同时缓解计算瓶颈。 在ImageNet-1K(图像识别)、COCO(目标检测和实例分割)以及ADE20K(语义分割)上的实验表明Hyb-KAN ViT具有最先进的性能。消融研究验证了小波驱动的频谱先验在分割任务中的有效性,以及基于样条的方法在检测任务中的效率。该框架为视觉架构中参数效率和多尺度表示的平衡建立了新的范式。
https://arxiv.org/abs/2505.04740
Medical image segmentation is a pivotal task within the realms of medical image analysis and computer vision. While current methods have shown promise in accurately segmenting major regions of interest, the precise segmentation of boundary areas remains challenging. In this study, we propose a novel network architecture named CTO, which combines Convolutional Neural Networks (CNNs), Vision Transformer (ViT) models, and explicit edge detection operators to tackle this challenge. CTO surpasses existing methods in terms of segmentation accuracy and strikes a better balance between accuracy and efficiency, without the need for additional data inputs or label injections. Specifically, CTO adheres to the canonical encoder-decoder network paradigm, with a dual-stream encoder network comprising a mainstream CNN stream for capturing local features and an auxiliary StitchViT stream for integrating long-range dependencies. Furthermore, to enhance the model's ability to learn boundary areas, we introduce a boundary-guided decoder network that employs binary boundary masks generated by dedicated edge detection operators to provide explicit guidance during the decoding process. We validate the performance of CTO through extensive experiments conducted on seven challenging medical image segmentation datasets, namely ISIC 2016, PH2, ISIC 2018, CoNIC, LiTS17, and BTCV. Our experimental results unequivocally demonstrate that CTO achieves state-of-the-art accuracy on these datasets while maintaining competitive model complexity. The codes have been released at: this https URL.
医学图像分割是医学图像分析和计算机视觉领域中的核心任务之一。尽管目前的方法在准确划分主要感兴趣区域方面已显示出潜力,但精确地分割边界区域仍然是一个挑战。在这项研究中,我们提出了一种新的网络架构,名为CTO(Convolutional Transformer with Operators),它结合了卷积神经网络(CNN)、视觉变换器(ViT)模型和显式的边缘检测算子来解决这一难题。CTO在分割精度方面超越了现有的方法,并且在保持效率的同时达到了更好的精度与效率之间的平衡,无需额外的数据输入或标签注入。 具体而言,CTO遵循经典的编码-解码网络范式,具有一个双流编码器网络,包括一条主流的CNN流用于捕获局部特征和一条辅助的StitchViT流用于整合长距离依赖关系。此外,为了增强模型学习边界区域的能力,我们引入了一个由专用边缘检测算子生成的二进制边界掩码引导解码过程的边界指导式解码网络。 通过在七个具有挑战性的医学图像分割数据集上进行广泛的实验验证(即ISIC 2016、PH2、ISIC 2018、CoNIC、LiTS17和BTCV),我们证明了CTO在这类任务中能够达到最先进的精度,同时保持竞争的模型复杂度。相关代码已发布在:[此处提供链接]。 请注意,在上述翻译中,“this https URL”应当替换为实际发布的代码仓库或项目的具体网址以供参考。
https://arxiv.org/abs/2505.04652
High-fidelity wildfire monitoring using Unmanned Aerial Vehicles (UAVs) typically requires multimodal sensing - especially RGB and thermal imagery - which increases hardware cost and power consumption. This paper introduces SAM-TIFF, a novel teacher-student distillation framework for pixel-level wildfire temperature prediction and segmentation using RGB input only. A multimodal teacher network trained on paired RGB-Thermal imagery and radiometric TIFF ground truth distills knowledge to a unimodal RGB student network, enabling thermal-sensor-free inference. Segmentation supervision is generated using a hybrid approach of segment anything (SAM)-guided mask generation, and selection via TOPSIS, along with Canny edge detection and Otsu's thresholding pipeline for automatic point prompt selection. Our method is the first to perform per-pixel temperature regression from RGB UAV data, demonstrating strong generalization on the recent FLAME 3 dataset. This work lays the foundation for lightweight, cost-effective UAV-based wildfire monitoring systems without thermal sensors.
利用无人驾驶航空器(UAV)进行高保真的野火监测通常需要多模态传感,尤其是RGB和热成像数据,这会增加硬件成本和能耗。本文介绍了一种新颖的教师-学生蒸馏框架SAM-TIFF,该框架仅使用RGB输入即可实现像素级别的野火温度预测与分割。一个多模态教师网络在配对的RGB-热成像图像及辐射度TIFF地面真值上进行训练,并将知识传递给单模RGB学生网络,从而实现在没有热传感器的情况下也能进行推断。该方法通过混合生成分割监督:使用SAM(Segment Anything Model)引导的掩码生成和TOPSIS选择,以及结合Canny边缘检测与Otsu阈值处理流程自动选择点提示来实现这一目标。 我们的方法首次从RGB UAV数据中实现了每像素温度回归,并在最近发布的FLAME 3数据集上展示了强大的泛化能力。这项工作为轻量级、低成本且无需热传感器的UAV野火监测系统奠定了基础。
https://arxiv.org/abs/2505.01638
Existing edge detection methods often suffer from noise amplification and excessive retention of non-salient details, limiting their applicability in high-precision industrial scenarios. To address these challenges, we propose CAM-EDIT, a novel framework that integrates Channel Attention Mechanism (CAM) and Edge Detection via Independence Testing (EDIT). The CAM module adaptively enhances discriminative edge features through multi-channel fusion, while the EDIT module employs region-wise statistical independence analysis (using Fisher's exact test and chi-square test) to suppress uncorrelated this http URL experiments on BSDS500 and NYUDv2 datasets demonstrate state-of-the-art performance. Among the nine comparison algorithms, the F-measure scores of CAM-EDIT are 0.635 and 0.460, representing improvements of 19.2\% to 26.5\% over traditional methods (Canny, CannySR), and better than the latest learning based methods (TIP2020, MSCNGP). Noise robustness evaluations further reveal a 2.2\% PSNR improvement under Gaussian noise compared to baseline methods. Qualitative results exhibit cleaner edge maps with reduced artifacts, demonstrating its potential for high-precision industrial applications.
现有的边缘检测方法常常会放大噪声,并过度保留不重要的细节,这限制了它们在高精度工业场景中的应用。为了应对这些挑战,我们提出了CAM-EDIT这一新框架,该框架结合了通道注意机制(Channel Attention Mechanism, CAM)和基于独立性测试的边缘检测(Edge Detection via Independence Testing, EDIT)。其中,CAM模块通过多通道融合自适应增强辨识度高的边缘特征,而EDIT模块则利用区域统计独立性分析(采用费舍尔精确检验和卡方检验)来抑制不相关的噪声。在BSDS500和NYUDv2数据集上的实验表明,该框架达到了最先进的性能水平。 与九种对比算法相比,CAM-EDIT的F-measure分数分别为0.635和0.460,在传统方法(如Canny, CannySR)的基础上分别提高了19.2%至26.5%,优于最新的基于学习的方法(如TIP2020, MSCNGP)。噪声鲁棒性评估进一步显示,在高斯噪声下,CAM-EDIT相比基线方法PSNR值提升了2.2%。定性的结果显示,边缘图更清晰且减少了伪影,表明它在高精度工业应用中具有巨大潜力。
https://arxiv.org/abs/2505.01040
Edge detection is crucial in image processing, but existing methods often produce overly detailed edge maps, affecting clarity. Fixed-window statistical testing faces issues like scale mismatch and computational redundancy. To address these, we propose a novel Multi-scale Adaptive Independence Testing-based Edge Detection and Denoising (EDD-MAIT), a Multi-scale Adaptive Statistical Testing-based edge detection and denoising method that integrates a channel attention mechanism with independence testing. A gradient-driven adaptive window strategy adjusts window sizes dynamically, improving detail preservation and noise suppression. EDD-MAIT achieves better robustness, accuracy, and efficiency, outperforming traditional and learning-based methods on BSDS500 and BIPED datasets, with improvements in F-score, MSE, PSNR, and reduced runtime. It also shows robustness against Gaussian noise, generating accurate and clean edge maps in noisy environments.
边缘检测在图像处理中至关重要,但现有方法往往会产生过于详细的边缘图,影响清晰度。固定窗口统计测试面临尺度不匹配和计算冗余等问题。为解决这些问题,我们提出了一种基于多尺度自适应独立性检验的边缘检测与去噪(EDD-MAIT)的新方法。这是一种结合了通道注意力机制和独立性测试的多尺度自适应统计测试基边检测方法。该方法采用梯度驱动的自适应窗口策略动态调整窗口大小,从而提高细节保留能力和噪声抑制能力。 EDD-MAIT在BSDS500和BIPED数据集上表现出更好的鲁棒性、准确性和效率,在F-score、MSE(均方误差)、PSNR(峰值信噪比)等指标上有显著改善,并且运行时间更短。此外,该方法对高斯噪声具有较强的鲁棒性,能够在嘈杂环境中生成精确而干净的边缘图。
https://arxiv.org/abs/2505.01032
Here, we propose Deep CS-TRD, a new automatic algorithm for detecting tree rings in whole cross-sections. It substitutes the edge detection step of CS-TRD by a deep-learning-based approach (U-Net), which allows the application of the method to different image domains: microscopy, scanner or smartphone acquired, and species (Pinus taeda, Gleditsia triachantos and Salix glauca). Additionally, we introduce two publicly available datasets of annotated images to the community. The proposed method outperforms state-of-the-art approaches in macro images (Pinus taeda and Gleditsia triacanthos) while showing slightly lower performance in microscopy images of Salix glauca. To our knowledge, this is the first paper that studies automatic tree ring detection for such different species and acquisition conditions. The dataset and source code are available in this https URL
在这里,我们提出了Deep CS-TRD,这是一种新的用于在整木横截面上自动检测年轮的算法。该算法用基于深度学习的方法(U-Net)替代了CS-TRD中的边缘检测步骤,这使得该方法可以应用于不同的图像领域:包括显微镜、扫描仪或智能手机获取的图像以及不同种类的树木(如Pinus taeda, Gleditsia triacanthos 和 Salix glauca)。此外,我们还向社区引入了两个公开可用的带有标注的图像数据集。所提出的方法在宏观图像(Pinus taeda和Gleditsia triacanthos)上超越了现有技术方法的表现,并且在Salix glauca的显微镜图像上的表现略低一些。据我们所知,这是第一篇研究针对不同种类树木及获取条件下的自动年轮检测的论文。数据集和源代码可在该网址获得:[此链接]
https://arxiv.org/abs/2504.16242
Edge detection has attracted considerable attention thanks to its exceptional ability to enhance performance in downstream computer vision tasks. In recent years, various deep learning methods have been explored for edge detection tasks resulting in a significant performance improvement compared to conventional computer vision algorithms. In neural networks, edge detection tasks require considerably large receptive fields to provide satisfactory performance. In a typical convolutional operation, such a large receptive field can be achieved by utilizing a significant number of consecutive layers, which yields deep network structures. Recently, a Multi-scale Tensorial Summation (MTS) factorization operator was presented, which can achieve very large receptive fields even from the initial layers. In this paper, we propose a novel MTS Dimensional Reduction (MTS-DR) module guided neural network, MTS-DR-Net, for the edge detection task. The MTS-DR-Net uses MTS layers, and corresponding MTS-DR blocks as a new backbone to remove redundant information initially. Such a dimensional reduction module enables the neural network to focus specifically on relevant information (i.e., necessary subspaces). Finally, a weight U-shaped refinement module follows MTS-DR blocks in the MTS-DR-Net. We conducted extensive experiments on two benchmark edge detection datasets: BSDS500 and BIPEDv2 to verify the effectiveness of our model. The implementation of the proposed MTS-DR-Net can be found at this https URL.
边缘检测因其在下游计算机视觉任务中增强性能的卓越能力而引起了极大的关注。近年来,各种深度学习方法被探索用于边缘检测任务,并且与传统计算机视觉算法相比,在这些任务上取得了显著的性能提升。在神经网络中,边缘检测任务需要相当大的感受野才能提供令人满意的性能。在一个典型的卷积操作中,可以通过利用大量连续层来实现这样的大感受野,从而产生深度网络结构。最近,提出了多尺度张量求和(MTS)因子化算子,它可以在初始层就达到非常大的感受野。在本文中,我们为边缘检测任务提出了一种新颖的基于MTS维度减少(MTS-DR)模块引导的神经网络,即MTS-DR-Net。MTS-DR-Net使用MTS层和相应的MTS-DR块作为新的骨干结构来删除冗余信息。这样的维度减少模块使神经网络能够专注于相关信息(例如,必要的子空间)。最后,在MTS-DR-Net中,一个权重U形细化模块跟随在MTS-DR块之后。我们在两个基准边缘检测数据集:BSDS500和BIPEDv2上进行了广泛的实验以验证我们模型的有效性。提出的MTS-DR-Net的实现可以在以下链接找到:[提供链接的位置]。 请注意,最后提到的URL需要用户根据具体情况进行补充或替换为实际可用的链接地址。
https://arxiv.org/abs/2504.15770
This paper presents a comprehensive evaluation framework for image segmentation algorithms, encompassing naive methods, machine learning approaches, and deep learning techniques. We begin by introducing the fundamental concepts and importance of image segmentation, and the role of interactive segmentation in enhancing accuracy. A detailed background theory section explores various segmentation methods, including thresholding, edge detection, region growing, feature extraction, random forests, support vector machines, convolutional neural networks, U-Net, and Mask R-CNN. The implementation and experimental setup are thoroughly described, highlighting three primary approaches: algorithm assisting user, user assisting algorithm, and hybrid methods. Evaluation metrics such as Intersection over Union (IoU), computation time, and user interaction time are employed to measure performance. A comparative analysis presents detailed results, emphasizing the strengths, limitations, and trade-offs of each method. The paper concludes with insights into the practical applicability of these approaches across various scenarios and outlines future work, focusing on expanding datasets, developing more representative approaches, integrating real-time feedback, and exploring weakly supervised and self-supervised learning paradigms to enhance segmentation accuracy and efficiency. Keywords: Image Segmentation, Interactive Segmentation, Machine Learning, Deep Learning, Computer Vision
本文提出了一种全面评估图像分割算法的框架,涵盖了简单方法、机器学习方法和深度学习技术。文章首先介绍了图像分割的基本概念及其重要性,并探讨了交互式分割在提高准确性方面的作用。详细的背景理论部分探索了各种分割方法,包括阈值处理、边缘检测、区域增长、特征提取、随机森林、支持向量机、卷积神经网络、U-Net和Mask R-CNN。 实施与实验设置被详尽描述,重点介绍了三种主要的方法:算法辅助用户、用户辅助算法以及混合方法。采用交并比(IoU)、计算时间和用户交互时间等评估指标来衡量性能表现。比较分析部分详细展示了各种方法的结果,并强调了每种方法的优势、局限性及权衡因素。 文章最后总结了这些方法在不同场景中的实际应用价值,并展望未来工作,重点关注扩大数据集规模、开发更具代表性的方法、整合实时反馈以及探索弱监督和自监督学习范式以提高分割准确性和效率。关键词包括:图像分割、交互式分割、机器学习、深度学习、计算机视觉。
https://arxiv.org/abs/2504.04435
Edge detection remains a fundamental yet challenging task in computer vision, especially under varying illumination, noise, and complex scene conditions. This paper introduces a Hybrid Multi-Stage Learning Framework that integrates Convolutional Neural Network (CNN) feature extraction with a Support Vector Machine (SVM) classifier to improve edge localization and structural accuracy. Unlike conventional end-to-end deep learning models, our approach decouples feature representation and classification stages, enhancing robustness and interpretability. Extensive experiments conducted on benchmark datasets such as BSDS500 and NYUDv2 demonstrate that the proposed framework outperforms traditional edge detectors and even recent learning-based methods in terms of Optimal Dataset Scale (ODS) and Optimal Image Scale (OIS), while maintaining competitive Average Precision (AP). Both qualitative and quantitative results highlight enhanced performance on edge continuity, noise suppression, and perceptual clarity achieved by our method. This work not only bridges classical and deep learning paradigms but also sets a new direction for scalable, interpretable, and high-quality edge detection solutions.
边缘检测仍然是计算机视觉中的一个基本且具有挑战性的任务,尤其是在不同的照明条件、噪声以及复杂场景下。本文提出了一种混合多阶段学习框架,该框架结合了卷积神经网络(CNN)的特征提取与支持向量机(SVM)分类器的功能,以提高边缘定位和结构准确性。不同于传统的端到端深度学习模型,我们的方法解耦了特征表示和分类阶段,从而增强了鲁棒性和可解释性。 在BSDS500和NYUDv2等基准数据集上进行的大量实验表明,所提出的框架在最优数据规模(ODS)和最优图像尺度(OIS)方面超过了传统边缘检测器以及近期的学习方法,并且保持了竞争性的平均精度(AP)。无论是定性还是定量结果都显示出了我们的方法在边缘连续性、噪声抑制及感知清晰度方面的性能提升。 这项工作不仅连接了经典学习与深度学习范式,而且还为可扩展的、解释性强和高质量的边缘检测解决方案开辟了一条新的道路。
https://arxiv.org/abs/2503.21827
The CLIP model has demonstrated significant advancements in aligning visual and language modalities through large-scale pre-training on image-text pairs, enabling strong zero-shot classification and retrieval capabilities on various domains. However, CLIP's training remains computationally intensive, with high demands on both data processing and memory. To address these challenges, recent masking strategies have emerged, focusing on the selective removal of image patches to improve training efficiency. Although effective, these methods often compromise key semantic information, resulting in suboptimal alignment between visual features and text descriptions. In this work, we present a concise yet effective approach called Patch Generation-to-Selection to enhance CLIP's training efficiency while preserving critical semantic content. Our method introduces a gradual masking process in which a small set of candidate patches is first pre-selected as potential mask regions. Then, we apply Sobel edge detection across the entire image to generate an edge mask that prioritizes the retention of the primary object areas. Finally, similarity scores between the candidate mask patches and their neighboring patches are computed, with optimal transport normalization refining the selection process to ensure a balanced similarity matrix. Our approach, CLIP-PGS, sets new state-of-the-art results in zero-shot classification and retrieval tasks, achieving superior performance in robustness evaluation and language compositionality benchmarks.
CLIP模型通过在图像-文本对上进行大规模预训练,在视觉和语言模式的对齐方面取得了显著进展,从而能够在各种领域中实现强大的零样本分类和检索能力。然而,CLIP的训练仍具有较高的计算需求,特别是在数据处理和内存使用方面。为了应对这些挑战,最近出现了一些掩码策略,通过选择性地移除图像补丁来提高训练效率。尽管这些方法有效,但它们通常会牺牲关键的语义信息,导致视觉特征与文本描述之间的对齐效果不佳。 在本文中,我们提出了一种简洁而有效的称为Patch Generation-to-Selection的方法,旨在提升CLIP的训练效率的同时保留重要的语义内容。我们的方法引入了逐步掩码过程,在此过程中,首先从图像中选取一小部分候选补丁作为潜在的掩膜区域。然后,我们在整个图像上应用Sobel边缘检测算法来生成一个边缘掩模,优先保持主要物体区域。最后,计算候选掩模补丁与其邻近补丁之间的相似度分数,并通过最优传输归一化对选择过程进行优化,以确保相似性矩阵的平衡。 我们的方法CLIP-PGS在零样本分类和检索任务中取得了新的最先进成果,在鲁棒性评估和语言组合性基准测试中也表现出优越性能。
https://arxiv.org/abs/2503.17080
This study presents a novel approach for roof detail extraction and vectorization using remote sensing images. Unlike previous geometric-primitive-based methods that rely on the detection of corners, our method focuses on edge detection as the primary mechanism for roof reconstruction, while utilizing geometric relationships to define corners and faces. We adapt the YOLOv8 OBB model, originally designed for rotated object detection, to extract roof edges effectively. Our method demonstrates robustness against noise and occlusion, leading to precise vectorized representations of building roofs. Experiments conducted on the SGA and Melville datasets highlight the method's effectiveness. At the raster level, our model outperforms the state-of-the-art foundation segmentation model (SAM), achieving a mIoU between 0.85 and 1 for most samples and an ovIoU close to 0.97. At the vector level, evaluation using the Hausdorff distance, PolyS metric, and our raster-vector-metric demonstrates significant improvements after polygonization, with a close approximation to the reference data. The method successfully handles diverse roof structures and refines edge gaps, even on complex roof structures of new, excluded from training datasets. Our findings underscore the potential of this approach to address challenges in automatic roof structure vectorization, supporting various applications such as urban terrain reconstruction.
这项研究提出了一种使用遥感图像提取和矢量化屋顶细节的新方法。与以往基于几何原语的方法依赖于角点检测不同,我们的方法主要侧重于边缘检测作为屋顶重建的主要机制,并利用几何关系来定义角点和面。我们对 YOLOv8 OBB 模型进行了改进,该模型最初是为旋转物体检测设计的,以有效地提取屋顶边缘。我们的方法展示了在噪声和遮挡情况下的鲁棒性,从而能够生成精确的建筑屋顶矢量表示。我们在 SGA 和 Melville 数据集上进行的实验突显了这种方法的有效性。 在栅格层面,我们的模型优于最先进的基础分割模型(SAM),大多数样本的 mIoU 在 0.85 到 1 之间,并且 ovIoU 接近 0.97。在矢量层面上,使用 Hausdorff 距离、PolyS 指标以及我们提出的栅格-矢量指标进行评估,在多边形化后表现出显著改进,接近参考数据。 该方法成功处理了各种屋顶结构,并且即使对于从未参与训练的数据集中的复杂屋顶结构也能精炼边缘缺口。我们的研究结果强调了此方法在自动屋顶结构矢量化挑战中应用的潜力,支持诸如城市地形重建等各种应用场景。
https://arxiv.org/abs/2503.09187
This study developed an algorithm capable of detecting a reference line (a 0.2 mm thick piano wire) to accurately determine the position of an automated installation robot within an elevator shaft. A total of 3,245 images were collected from the experimental tower of H Company, the leading elevator manufacturer in South Korea, and the detection performance was evaluated using four experimental approaches (GCH, GSCH, GECH, FCH). During the initial image processing stage, Gaussian blurring, sharpening filter, embossing filter, and Fourier Transform were applied, followed by Canny Edge Detection and Hough Transform. Notably, the method was developed to accurately extract the reference line by averaging the x-coordinates of the lines detected through the Hough Transform. This approach enabled the detection of the 0.2 mm thick piano wire with high accuracy, even in the presence of noise and other interfering factors (e.g., concrete cracks inside the elevator shaft or safety bars for filming equipment). The experimental results showed that Experiment 4 (FCH), which utilized Fourier Transform in the preprocessing stage, achieved the highest detection rate for the LtoL, LtoR, and RtoL datasets. Experiment 2(GSCH), which applied Gaussian blurring and a sharpening filter, demonstrated superior detection performance on the RtoR dataset. This study proposes a reference line detection algorithm that enables precise position calculation and control of automated robots in elevator shaft installation. Moreover, the developed method shows potential for applicability even in confined working spaces. Future work aims to develop a line detection algorithm equipped with machine learning-based hyperparameter tuning capabilities.
这项研究开发了一种能够检测基准线(一根0.2毫米厚的钢琴弦)的算法,以准确确定电梯井中自动安装机器人位置。总共从韩国领先的电梯制造商H公司的实验塔楼收集了3,245张图像,并通过四种实验方法(GCH、GSCH、GECH和FCH)评估了检测性能。在初始图像处理阶段,采用了高斯模糊、锐化滤镜、浮雕滤镜以及傅里叶变换,并随后应用Canny边缘检测和霍夫变换。 特别值得注意的是,该方法通过平均霍夫变换中检测到的线段的x坐标来准确提取基准线。这种方法能够在存在噪声和其他干扰因素(如电梯井内的混凝土裂缝或用于拍摄设备的安全栏杆)的情况下,以高精度识别出0.2毫米厚的钢琴弦。实验结果显示,在预处理阶段使用傅里叶变换的实验4(FCH),在LtoL、LtoR和RtoL数据集上的检测率最高;而应用了高斯模糊与锐化滤镜的实验2(GSCH)则在RtoR数据集中表现出优越的检测性能。 本研究提出了一种基准线检测算法,可以实现电梯井安装中自动机器人精确位置计算和控制。此外,所开发的方法还显示出了适用于狭小工作空间的应用潜力。未来的研究将致力于开发一种具备基于机器学习的超参数调优能力的线条检测算法。
https://arxiv.org/abs/2503.13473
Extracting geometric edges from unstructured point clouds remains a significant challenge, particularly in thin-walled structures that are commonly found in everyday objects. Traditional geometric methods and recent learning-based approaches frequently struggle with these structures, as both rely heavily on sufficient contextual information from local point neighborhoods. However, 3D measurement data of thin-walled structures often lack the accurate, dense, and regular neighborhood sampling required for reliable edge extraction, resulting in degraded performance. In this work, we introduce STAR-Edge, a novel approach designed for detecting and refining edge points in thin-walled structures. Our method leverages a unique representation-the local spherical curve-to create structure-aware neighborhoods that emphasize co-planar points while reducing interference from close-by, non-co-planar surfaces. This representation is transformed into a rotation-invariant descriptor, which, combined with a lightweight multi-layer perceptron, enables robust edge point classification even in the presence of noise and sparse or irregular sampling. Besides, we also use the local spherical curve representation to estimate more precise normals and introduce an optimization function to project initially identified edge points exactly on the true edges. Experiments conducted on the ABC dataset and thin-walled structure-specific datasets demonstrate that STAR-Edge outperforms existing edge detection methods, showcasing better robustness under various challenging conditions.
从非结构化点云中提取几何边缘仍然是一个重大挑战,尤其是在日常物体中常见的薄壁结构。传统几何方法和近期基于学习的方法在处理这些结构时经常遇到困难,因为它们都严重依赖于局部点邻域提供的充足上下文信息。然而,由于3D测量数据中的薄壁结构通常缺乏精确、密集且规则的局部采样,这导致了可靠的边缘提取性能下降。 为此,我们提出了一种名为STAR-Edge的新方法,专门用于检测和细化薄壁结构中的边缘点。该方法利用一种独特的表示方式——局部球面曲线(local spherical curve),以此来创建具有结构意识的邻域,突出共面点的同时减少与非共平面表面带来的干扰。此表示进一步被转换为旋转不变描述符,并结合轻量级多层感知器,即使在存在噪声和稀疏或不规则采样情况下也能实现鲁棒边缘点分类。 此外,我们还利用局部球面曲线来估计更精确的法线,并引入优化函数将初始识别的边缘点准确投影到真正的边缘上。实验结果显示,在ABC数据集以及针对薄壁结构专门的数据集中,STAR-Edge在各种挑战性条件下均优于现有的边缘检测方法,表现出更强的鲁棒性。
https://arxiv.org/abs/2503.00801
Full-Waveform Inversion seeks to achieve a high-resolution model of the subsurface through the application of multi-variate optimization to the seismic inverse problem. Although now a mature technology, FWI has limitations related to the choice of the appropriate solver for the forward problem in challenging environments requiring complex assumptions, and very wide angle and multi-azimuth data necessary for full reconstruction are often not available. Deep Learning techniques have emerged as excellent optimization frameworks. Data-driven methods do not impose a wave propagation model and are not exposed to modelling errors. On the contrary, deterministic models are governed by the laws of physics. Seismic FWI has recently started to be investigated as a Deep Learning framework. Focus has been on the time-domain, while the pseudo-spectral domain has not been yet explored. However, classical FWI experienced major breakthroughs when pseudo-spectral approaches were employed. This work addresses the lacuna that exists in incorporating the pseudo-spectral approach within Deep Learning. This has been done by re-formulating the pseudo-spectral FWI problem as a Deep Learning algorithm for a theory-driven pseudo-spectral approach. A novel Recurrent Neural Network framework is proposed. This is qualitatively assessed on synthetic data, applied to a two-dimensional Marmousi dataset and evaluated against deterministic and time-based approaches. Pseudo-spectral theory-guided FWI using RNN was shown to be more accurate than classical FWI with only 0.05 error tolerance and 1.45\% relative percent-age error. Indeed, this provides more stable convergence, able to identify faults better and has more low frequency content than classical FWI. Moreover, RNN was more suited than classical FWI at edge detection in the shallow and deep sections due to cleaner receiver residuals.
全波形反演(Full-Waveform Inversion,FWI)旨在通过多变量优化方法解决地震逆问题,从而获得地下高分辨率的模型。尽管FWI现在是一项成熟的技术,但在复杂环境中选择合适的正向问题求解器时仍存在局限性,并且需要非常宽的角度和多方位的数据才能进行完整的重建,而这些数据通常不可用。深度学习技术已经作为优秀的优化框架出现。数据驱动的方法不依赖于波传播模型,因此不受建模误差的影响。相反,确定性的模型受物理定律的约束。最近,地震FWI开始被研究为一种深度学习框架。目前的研究主要集中在时间域上,而伪谱域尚未得到探索。然而,传统的FWI在采用伪谱方法时取得了重大突破。 这项工作旨在填补将伪谱方法整合到深度学习中的空白。通过重新表述基于理论的伪谱FWI问题为深度学习算法,实现了这一目标,并提出了一种新的递归神经网络框架。该方案已在合成数据上进行了定性评估,并应用于二维Marmousi数据集,同时与确定性和时间基方法进行了对比。结果表明,在仅0.05误差容限和1.45%相对百分比误差的情况下,使用RNN的伪谱理论指导下的FWI比传统FWI更准确。此外,这种方法能够提供更为稳定的收敛性,更好地识别断层,并且具有比传统FWI更多的低频成分。更重要的是,在浅层和深层区域边缘检测方面,RNN方法的表现优于传统FWI,这得益于接收器残差的清洁度。 总之,该研究展示了将深度学习技术应用于地震全波形反演中的伪谱理论指导策略的有效性,并为未来的相关应用提供了新的可能性。
https://arxiv.org/abs/2502.17624
To satisfy the rigorous requirements of precise edge detection in critical high-accuracy measurements, this article proposes a series of efficient approaches for localizing subpixel edge. In contrast to the fitting based methods, which consider pixel intensity as a sample value derived from a specific model. We take an innovative perspective by assuming that the intensity at the pixel level can be interpreted as a local integral mapping in the intensity model for subpixel localization. Consequently, we propose a straightforward subpixel edge localization method called Converted Intensity Summation (CIS). To address the limited robustness associated with focusing solely on the localization of individual edge points, a Stable Edge Region (SER) based algorithm is presented to alleviate local interference near edges. Given the observation that the consistency of edge statistics exists in the local region, the algorithm seeks correlated stable regions in the vicinity of edges to facilitate the acquisition of robust parameters and achieve higher precision positioning. In addition, an edge complement method based on extension-adjustment is also introduced to rectify the irregular edges through the efficient migration of SERs. A large number of experiments are conducted on both synthetic and real image datasets which cover common edge patterns as well as various real scenarios such as industrial PCB images, remote sensing and medical images. It is verified that CIS can achieve higher accuracy than the state-of-the-art method, while requiring less execution time. Moreover, by integrating SER into CIS, the proposed algorithm demonstrates excellent performance in further improving the anti-interference capability and positioning accuracy.
为了满足在高精度测量中精确边缘检测的严格要求,本文提出了一系列用于亚像素级边缘定位的有效方法。与基于拟合的方法不同,后者将像素强度视为来自特定模型的样本值,我们采取了一个创新的角度,假设在像素级别的强度可以解释为局部积分映射,在该映射下进行亚像素定位。因此,我们提出了一种简单的称为转换强度累加(Converted Intensity Summation, CIS)的亚像素边缘定位方法。 为了克服仅关注单一边缘点定位所导致的有限鲁棒性问题,本文还提出了一种基于稳定边缘区域(Stable Edge Region, SER)的方法,以减轻边缘附近的局部干扰。鉴于边缘统计在局部区域内的一致性存在,该算法寻找边缘附近的相关稳定区域,以便获取稳健参数并实现更精确的位置定位。 此外,我们还引入了一种基于扩展调整的边缘补充方法,通过有效迁移SER来纠正不规则边缘。 在包含常见边缘模式及各种实际场景(如工业PCB图像、遥感和医学图像)的人工合成与真实图像数据集上进行了大量实验。结果验证了CIS方法比当前最优方法具有更高的精度,并且需要更少的执行时间。此外,通过将SER集成到CIS中,所提出的算法在进一步提升抗干扰能力和定位准确性方面表现出了卓越性能。
https://arxiv.org/abs/2502.16502
Ocean front is defined as the interface between different water masses and plays a vital role in the evolution of many physical phenomena. Previous detection methods are based on histogram, Lyapunov exponent, gradient and machine learning. These algorithms, however, introduce discontinuity, inaccuracy, use less information or just approaching traditional results. Moreover, automatic front tracking algrorithm is not open source in preceding studies. This paper foucuses on large-scale ocean fronts and proposes an automatic front detection and tracking algorithm based on Bayesian decision and metric space. In this, front merging, filling and ring deletion are put forward to enhance continuity. The distance between fronts in different days is firstly defined and is well-defined in metric space for functional analysis. These technologies can be migrated to other areas of computer vision such as edge detection and tracking.
海洋前沿被定义为不同水体之间的界面,在许多物理现象的演化中起着至关重要的作用。以往的检测方法基于直方图、李雅普诺夫指数、梯度和机器学习等技术,然而这些算法引入了不连续性、不准确性或信息利用不足的问题,甚至有些方法只是接近传统结果。此外,在先前的研究中,自动前沿追踪算法未开源。 本文重点关注大规模海洋前沿,并提出了一种基于贝叶斯决策和度量空间的自动前沿检测与跟踪算法。文中提出了前沿合并、填充和环形删除等技术以增强连续性。首先定义了不同日期间前沿之间的距离,并在度量空间中进行了良好的定义,以便于函数分析。 这些技术可以迁移到计算机视觉领域的其他方面,例如边缘检测和追踪。
https://arxiv.org/abs/2502.15250