Edge detection (ED) remains a fundamental task in computer vision, yet its performance is often hindered by the ambiguous nature of non-edge pixels near object boundaries. The widely adopted Weighted Binary Cross-Entropy (WBCE) loss treats all non-edge pixels uniformly, overlooking the structural nuances around edges and often resulting in blurred predictions. In this paper, we propose the Edge-Boundary-Texture (EBT) loss, a novel objective that explicitly divides pixels into three categories, edge, boundary, and texture, and assigns each a distinct supervisory weight. This tri-class formulation enables more structured learning by guiding the model to focus on both edge precision and contextual boundary localization. We theoretically show that the EBT loss generalizes the WBCE loss, with the latter becoming a limit case. Extensive experiments across multiple benchmarks demonstrate the superiority of the EBT loss both quantitatively and perceptually. Furthermore, the consistent use of unified hyperparameters across all models and datasets, along with robustness to their moderate variations, indicates that the EBT loss requires minimal fine-tuning and is easily deployable in practice.
边缘检测(ED)仍然是计算机视觉中的基本任务,但其性能常常受到对象边界附近非边缘像素模糊性质的限制。广泛采用的加权二元交叉熵(WBCE)损失函数对所有非边缘像素一视同仁,忽略了边缘周围的结构细节,通常导致预测结果模糊不清。在本文中,我们提出了边缘-边界-纹理(EBT)损失,这是一种新颖的目标函数,它明确地将像素分为三类:边缘、边界和纹理,并为每一类分配一个独特的监督权重。这种三分式框架使模型能够进行更有结构化的学习,指导其关注边缘精度和上下文边界的定位。理论上证明了EBT损失泛化了WBCE损失,而后者成为一种极限情况。在多个基准测试中的广泛实验表明,从定量和感知角度来看,EBT损失优于传统的WBCE损失。此外,所有模型和数据集在整个研究中保持统一的超参数使用,并且对适度变化具有鲁棒性,这表明EBT损失需要最少的微调并且易于实际部署。
https://arxiv.org/abs/2507.06569
Fever screening based on infrared thermographs (IRTs) is a viable mass screening approach during infectious disease pandemics, such as Ebola and SARS, for temperature monitoring in public places like hospitals and airports. IRTs have found to be powerful, quick and non-invasive methods to detect elevated temperatures. Moreover, regions medially adjacent to the inner canthi (called the canthi regions in this paper) are preferred sites for fever screening. Accurate localization of the canthi regions can be achieved through multi-modal registration of infrared (IR) and white-light images. We proposed a registration method through a coarse-fine registration strategy using different registration models based on landmarks and edge detection on eye contours. We evaluated the registration accuracy to be within 2.7 mm, which enables accurate localization of the canthi regions.
基于红外热图(IRT)的发热筛查在埃博拉和SARS等传染病大流行期间,是一种可行的大规模筛查方法,适用于医院、机场等人流密集场所的体温监测。IRT已被证明是检测高体温的有效、快速且无创的方法。此外,在此研究中被称为内眼角区域(即靠近内眼角的区域)被认为是发热筛查的理想部位。通过红外图像和白光图像的多模态配准技术,可以实现这些内眼角区域的精确定位。 我们提出了一种粗细结合的注册方法策略,该策略采用基于地标和眼睛轮廓边缘检测的不同注册模型。我们评估了这种配准方法的准确性在2.7毫米以内,这使得能够精确地定位内眼角区域。
https://arxiv.org/abs/2507.02955
This paper presents a newly developed mobile application designed to diagnose Latent Tuberculosis Infection (LTBI) using the Mantoux Skin Test (TST). Traditional TST methods often suffer from low follow-up return rates, patient discomfort, and subjective manual interpretation, particularly with the ball-point pen method, leading to misdiagnosis and delayed treatment. Moreover, previous developed mobile applications that used 3D reconstruction, this app utilizes scaling stickers as reference objects for induration measurement. This mobile application integrates advanced image processing technologies, including ARCore, and machine learning algorithms such as DeepLabv3 for robust image segmentation and precise measurement of skin indurations indicative of LTBI. The system employs an edge detection algorithm to enhance accuracy. The application was evaluated against standard clinical practices, demonstrating significant improvements in accuracy and reliability. This innovation is crucial for effective tuberculosis management, especially in resource-limited regions. By automating and standardizing TST evaluations, the application enhances the accessibility and efficiency of TB di-agnostics. Future work will focus on refining machine learning models, optimizing measurement algorithms, expanding functionalities to include comprehensive patient data management, and enhancing ARCore's performance across various lighting conditions and operational settings.
本文介绍了一款新开发的移动应用程序,旨在使用Mantoux皮肤测试(TST)诊断潜伏性结核感染(LTBI)。传统的TST方法经常面临随访率低、患者不适和主观手动解读的问题,尤其是使用圆珠笔测量时,这会导致误诊和治疗延迟。此前开发的一些采用3D重建技术的移动应用中,该应用程序则利用缩放贴纸作为参考物来测量皮肤硬结。 此移动应用集成了先进的图像处理技术,包括ARCore,并运用了DeepLabv3等机器学习算法来进行稳健的图像分割和精确测量LTBI特征性皮肤硬化的尺寸。系统还采用边缘检测算法以提高准确性。该应用程序经过标准临床实践评估,显示出显著提高了准确性和可靠性。 这项创新对于有效管理结核病至关重要,尤其是在资源有限的地区。通过自动化和标准化TST评估流程,该应用增强了TB诊断的可访问性和效率。未来的工作将集中于改进机器学习模型、优化测量算法、扩展功能以包括全面的患者数据管理和增强ARCore在不同光照条件及操作环境下的性能。
https://arxiv.org/abs/2506.17954
Robust and accurate ball detection is a critical component for autonomous humanoid soccer robots, particularly in dynamic and challenging environments such as RoboCup outdoor fields. However, traditional supervised approaches require extensive manual annotation, which is costly and time-intensive. To overcome this problem, we present a self-supervised learning framework for domain-adaptive feature extraction to enhance ball detection performance. The proposed approach leverages a general-purpose pretrained model to generate pseudo-labels, which are then used in a suite of self-supervised pretext tasks -- including colorization, edge detection, and triplet loss -- to learn robust visual features without relying on manual annotations. Additionally, a model-agnostic meta-learning (MAML) strategy is incorporated to ensure rapid adaptation to new deployment scenarios with minimal supervision. A new dataset comprising 10,000 labeled images from outdoor RoboCup SPL matches is introduced, used to validate the method, and made available to the community. Experimental results demonstrate that the proposed pipeline outperforms baseline models in terms of accuracy, F1 score, and IoU, while also exhibiting faster convergence.
鲁棒且准确的球检测是自主人形足球机器人在动态和具有挑战性的环境中(如RoboCup室外场地)的关键组成部分。然而,传统的监督学习方法需要大量的手动标注工作,这既耗时又昂贵。为了解决这个问题,我们提出了一种基于领域自适应特征提取的自我监督学习框架,以提高球检测性能。该方法利用一个通用的预训练模型生成伪标签,并通过一系列自我监督任务(包括色彩恢复、边缘检测和三元组损失)来学习稳健的视觉特征,而无需依赖手动标注。 此外,我们还整合了一种与模型无关的元学习策略(MAML),以确保在新的部署场景中快速适应并减少对监督的需求。同时,我们引入了一个包含10,000张标签图像的新数据集,这些图像是从户外RoboCup SPL比赛中收集而来,并将用于验证该方法的有效性并向社区开放。 实验结果表明,所提出的框架在准确率、F1值和IoU指标上均优于基准模型,并且收敛速度更快。
https://arxiv.org/abs/2506.16821
Image edge detection (ED) faces a fundamental mismatch between training and inference: models are trained using continuous-valued outputs but evaluated using binary predictions. This misalignment, caused by the non-differentiability of binarization, weakens the link between learning objectives and actual task performance. In this paper, we propose a theoretical method to design a Binarization-Aware Adjuster (BAA), which explicitly incorporates binarization behavior into gradient-based optimization. At the core of BAA is a novel loss adjustment mechanism based on a Distance Weight Function (DWF), which reweights pixel-wise contributions according to their correctness and proximity to the decision boundary. This emphasizes decision-critical regions while down-weighting less influential ones. We also introduce a self-adaptive procedure to estimate the optimal binarization threshold for BAA, further aligning training dynamics with inference behavior. Extensive experiments across various architectures and datasets demonstrate the effectiveness of our approach. Beyond ED, BAA offers a generalizable strategy for bridging the gap between continuous optimization and discrete evaluation in structured prediction tasks.
图像边缘检测(ED)在训练和推理之间存在根本性的不匹配:模型是使用连续值输出进行训练的,但在评估时却采用二元预测。这种不对齐是由二值化过程的非可微性所导致的,这削弱了学习目标与实际任务性能之间的联系。在这篇论文中,我们提出了一种理论方法来设计一种二值化感知调整器(BAA),该方法明确地将二值化行为整合到基于梯度的优化过程中。BAA的核心是一个新颖的损失调整机制,它基于距离权重函数(DWF)重新加权像素级别的贡献,根据其正确性和接近决策边界的程度来进行。 这种方法强调了在关键决策区域中的重要性,并降低了对影响较小部分的权重。我们还引入了一个自适应过程来估计BAA的最佳二值化阈值,进一步将训练动态与推理行为相协调。通过多种架构和数据集上的广泛实验,证明了我们的方法的有效性。除了ED之外,BAA为在结构化预测任务中弥合连续优化与离散评估之间的差距提供了一种通用策略。
https://arxiv.org/abs/2506.12460
The adoption of neural network models in medical imaging has been constrained by strict privacy regulations, limited data availability, high acquisition costs, and demographic biases. Deep generative models offer a promising solution by generating synthetic data that bypasses privacy concerns and addresses fairness by producing samples for under-represented groups. However, unlike natural images, medical imaging requires validation not only for fidelity (e.g., Fréchet Inception Score) but also for morphological and clinical accuracy. This is particularly true for colour fundus retinal imaging, which requires precise replication of the retinal vascular network, including vessel topology, continuity, and thickness. In this study, we in-vestigated whether a distance-based loss function based on deep activation layers of a large foundational model trained on large corpus of domain data, colour fundus imaging, offers advantages over a perceptual loss and edge-detection based loss functions. Our extensive validation pipeline, based on both domain-free and domain specific tasks, suggests that domain-specific deep features do not improve autoen-coder image generation. Conversely, our findings highlight the effectiveness of con-ventional edge detection filters in improving the sharpness of vascular structures in synthetic samples.
在医学影像领域,神经网络模型的采用受到严格隐私法规、数据稀缺性、高昂获取成本以及人口统计学偏差的限制。深度生成模型通过生成合成数据来解决这些问题,这些数据可以绕过隐私问题,并为代表性不足的人群提供样本,从而促进公平性。然而,与自然图像不同,医学影像需要进行验证以确保其保真度(如Fréchet Inception Score)、形态学和临床准确性。特别是对于彩色眼底视网膜成像而言,精确复制包括血管拓扑结构、连续性和厚度在内的视网膜血管网络尤为重要。 在这项研究中,我们调查了一种基于大规模基础模型训练的大规模领域数据深层激活层的距离损失函数是否比感知损失和边缘检测损失函数具有优势。我们的验证管道涵盖了无域特异性和有特定领域的任务,结果表明专门的深度特征并不能改进自编码器图像生成的效果。相反,我们的研究强调了传统边缘检测滤波器在提高合成样本中血管结构清晰度方面的有效性。
https://arxiv.org/abs/2506.11753
Image segmentation is a fundamental task in computer vision aimed at delineating object boundaries within images. Traditional approaches, such as edge detection and variational methods, have been widely explored, while recent advances in deep learning have shown promising results but often require extensive training data. In this work, we propose a novel variational framework for 2D image segmentation that integrates concepts from shape analysis and diffeomorphic transformations. Our method models segmentation as the deformation of a template curve via a diffeomorphic transformation of the image domain, using the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework. The curve evolution is guided by a loss function that compares the deformed curve to the image gradient field, formulated through the varifold representation of geometric shapes. The approach is implemented in Python with GPU acceleration using the PyKeops library. This framework allows for accurate segmentation with a flexible and theoretically grounded methodology that does not rely on large datasets.
https://arxiv.org/abs/2506.09357
We present a quantitative circuit-level analysis of diffusion models, establishing computational pathways and mechanistic principles underlying image generation processes. Through systematic intervention experiments across 2,000 synthetic and 2,000 CelebA facial images, we discover fundamental algorithmic differences in how diffusion architectures process synthetic versus naturalistic data distributions. Our investigation reveals that real-world face processing requires circuits with measurably higher computational complexity (complexity ratio = 1.084 plus/minus 0.008, p < 0.001), exhibiting distinct attention specialization patterns with entropy divergence ranging from 0.015 to 0.166 across denoising timesteps. We identify eight functionally distinct attention mechanisms showing specialized computational roles: edge detection (entropy = 3.18 plus/minus 0.12), texture analysis (entropy = 4.16 plus/minus 0.08), and semantic understanding (entropy = 2.67 plus/minus 0.15). Intervention analysis demonstrates critical computational bottlenecks where targeted ablations produce 25.6% to 128.3% performance degradation, providing causal evidence for identified circuit functions. These findings establish quantitative foundations for algorithmic understanding and control of generative model behavior through mechanistic intervention strategies.
我们对扩散模型进行了定量的电路级分析,建立了图像生成过程背后的计算路径和机制原理。通过在2000张合成图像和2000张CelebA面部图像上进行系统干预实验,我们发现了扩散架构处理合成数据与自然数据分布时的基本算法差异。我们的研究发现,对现实世界人脸的处理需要具有可测量更高计算复杂度(复杂性比率 = 1.084 ± 0.008, p < 0.001)的电路,并表现出注意力专门化模式,去噪步骤间的熵散度范围为0.015至0.166。我们识别出了八种功能上不同的注意机制,显示出特定的计算角色:边缘检测(熵 = 3.18 ± 0.12),纹理分析(熵 = 4.16 ± 0.08)和语义理解(熵 = 2.67 ± 0.15)。干预分析展示了关键的计算瓶颈,目标消融会导致性能下降25.6%到128.3%,为所识别电路功能提供了因果证据。这些发现通过机制性干预策略建立了算法理解和控制生成模型行为的量化基础。
https://arxiv.org/abs/2506.17237
This paper proposes a tropical geometry-based edge detection framework that reformulates convolution and gradient computations using min-plus and max-plus algebra. The tropical formulation emphasizes dominant intensity variations, contributing to sharper and more continuous edge representations. Three variants are explored: an adaptive threshold-based method, a multi-kernel min-plus method, and a max-plus method emphasizing structural continuity. The framework integrates multi-scale processing, Hessian filtering, and wavelet shrinkage to enhance edge transitions while maintaining computational efficiency. Experiments on MATLAB built-in grayscale and color images suggest that tropical formulations integrated with classical operators, such as Canny and LoG, can improve boundary detection in low-contrast and textured regions. Quantitative evaluation using standard edge metrics indicates favorable edge clarity and structural coherence. These results highlight the potential of tropical algebra as a scalable and noise-aware formulation for edge detection in practical image analysis tasks.
本文提出了一种基于热带几何的边缘检测框架,该框架使用极小-极大加法代数(min-plus和max-plus)重新表述卷积和梯度计算。这种热带公式化强调了主要强度变化,有助于获得更清晰、更连续的边缘表示。文中探讨了三种变体:自适应阈值方法、多核极小-极大加法方法以及着重于结构连续性的极大-加法方法。该框架集成了多尺度处理、赫斯(Hessian)滤波和小波收缩,以增强边缘过渡并保持计算效率。在MATLAB内置的灰度和彩色图像上进行的实验表明,与经典算子如Canny和LoG集成的传统边界的热带公式化方法可以在低对比度和纹理区域提高边界检测能力。使用标准边缘指标进行的定量评估显示了有利的边缘清晰度和结构连贯性。这些结果突显了热带代数作为边缘检测实用图像分析任务中可扩展且对噪声敏感的表述方案的潜力。
https://arxiv.org/abs/2505.18625
This study addresses the inherent limitations of Multi-Layer Perceptrons (MLPs) in Vision Transformers (ViTs) by introducing Hybrid Kolmogorov-Arnold Network (KAN)-ViT (Hyb-KAN ViT), a novel framework that integrates wavelet-based spectral decomposition and spline-optimized activation functions, prior work has failed to focus on the prebuilt modularity of the ViT architecture and integration of edge detection capabilities of Wavelet functions. We propose two key modules: Efficient-KAN (Eff-KAN), which replaces MLP layers with spline functions and Wavelet-KAN (Wav-KAN), leveraging orthogonal wavelet transforms for multi-resolution feature extraction. These modules are systematically integrated in ViT encoder layers and classification heads to enhance spatial-frequency modeling while mitigating computational bottlenecks. Experiments on ImageNet-1K (Image Recognition), COCO (Object Detection and Instance Segmentation), and ADE20K (Semantic Segmentation) demonstrate state-of-the-art performance with Hyb-KAN ViT. Ablation studies validate the efficacy of wavelet-driven spectral priors in segmentation and spline-based efficiency in detection tasks. The framework establishes a new paradigm for balancing parameter efficiency and multi-scale representation in vision architectures.
这项研究通过引入混合Kolmogorov-Arnold网络(KAN)-视觉变换器(Hyb-KAN ViT),一种新的框架,解决了视觉变换器中多层感知机(MLPs)的内在局限性。该框架结合了基于小波的频谱分解和样条优化激活函数。先前的研究未能关注ViT架构的预构建模块化以及小波功能在边缘检测能力上的整合。我们提出了两个关键模块:高效KAN(Eff-KAN),用样条函数替换MLP层,以及Wavelet KAN(Wav-KAN),利用正交小波变换进行多分辨率特征提取。这些模块系统地集成到ViT编码器层和分类头中,以增强空间-频率建模的同时缓解计算瓶颈。 在ImageNet-1K(图像识别)、COCO(目标检测和实例分割)以及ADE20K(语义分割)上的实验表明Hyb-KAN ViT具有最先进的性能。消融研究验证了小波驱动的频谱先验在分割任务中的有效性,以及基于样条的方法在检测任务中的效率。该框架为视觉架构中参数效率和多尺度表示的平衡建立了新的范式。
https://arxiv.org/abs/2505.04740
Medical image segmentation is a pivotal task within the realms of medical image analysis and computer vision. While current methods have shown promise in accurately segmenting major regions of interest, the precise segmentation of boundary areas remains challenging. In this study, we propose a novel network architecture named CTO, which combines Convolutional Neural Networks (CNNs), Vision Transformer (ViT) models, and explicit edge detection operators to tackle this challenge. CTO surpasses existing methods in terms of segmentation accuracy and strikes a better balance between accuracy and efficiency, without the need for additional data inputs or label injections. Specifically, CTO adheres to the canonical encoder-decoder network paradigm, with a dual-stream encoder network comprising a mainstream CNN stream for capturing local features and an auxiliary StitchViT stream for integrating long-range dependencies. Furthermore, to enhance the model's ability to learn boundary areas, we introduce a boundary-guided decoder network that employs binary boundary masks generated by dedicated edge detection operators to provide explicit guidance during the decoding process. We validate the performance of CTO through extensive experiments conducted on seven challenging medical image segmentation datasets, namely ISIC 2016, PH2, ISIC 2018, CoNIC, LiTS17, and BTCV. Our experimental results unequivocally demonstrate that CTO achieves state-of-the-art accuracy on these datasets while maintaining competitive model complexity. The codes have been released at: this https URL.
医学图像分割是医学图像分析和计算机视觉领域中的核心任务之一。尽管目前的方法在准确划分主要感兴趣区域方面已显示出潜力,但精确地分割边界区域仍然是一个挑战。在这项研究中,我们提出了一种新的网络架构,名为CTO(Convolutional Transformer with Operators),它结合了卷积神经网络(CNN)、视觉变换器(ViT)模型和显式的边缘检测算子来解决这一难题。CTO在分割精度方面超越了现有的方法,并且在保持效率的同时达到了更好的精度与效率之间的平衡,无需额外的数据输入或标签注入。 具体而言,CTO遵循经典的编码-解码网络范式,具有一个双流编码器网络,包括一条主流的CNN流用于捕获局部特征和一条辅助的StitchViT流用于整合长距离依赖关系。此外,为了增强模型学习边界区域的能力,我们引入了一个由专用边缘检测算子生成的二进制边界掩码引导解码过程的边界指导式解码网络。 通过在七个具有挑战性的医学图像分割数据集上进行广泛的实验验证(即ISIC 2016、PH2、ISIC 2018、CoNIC、LiTS17和BTCV),我们证明了CTO在这类任务中能够达到最先进的精度,同时保持竞争的模型复杂度。相关代码已发布在:[此处提供链接]。 请注意,在上述翻译中,“this https URL”应当替换为实际发布的代码仓库或项目的具体网址以供参考。
https://arxiv.org/abs/2505.04652
High-fidelity wildfire monitoring using Unmanned Aerial Vehicles (UAVs) typically requires multimodal sensing - especially RGB and thermal imagery - which increases hardware cost and power consumption. This paper introduces SAM-TIFF, a novel teacher-student distillation framework for pixel-level wildfire temperature prediction and segmentation using RGB input only. A multimodal teacher network trained on paired RGB-Thermal imagery and radiometric TIFF ground truth distills knowledge to a unimodal RGB student network, enabling thermal-sensor-free inference. Segmentation supervision is generated using a hybrid approach of segment anything (SAM)-guided mask generation, and selection via TOPSIS, along with Canny edge detection and Otsu's thresholding pipeline for automatic point prompt selection. Our method is the first to perform per-pixel temperature regression from RGB UAV data, demonstrating strong generalization on the recent FLAME 3 dataset. This work lays the foundation for lightweight, cost-effective UAV-based wildfire monitoring systems without thermal sensors.
利用无人驾驶航空器(UAV)进行高保真的野火监测通常需要多模态传感,尤其是RGB和热成像数据,这会增加硬件成本和能耗。本文介绍了一种新颖的教师-学生蒸馏框架SAM-TIFF,该框架仅使用RGB输入即可实现像素级别的野火温度预测与分割。一个多模态教师网络在配对的RGB-热成像图像及辐射度TIFF地面真值上进行训练,并将知识传递给单模RGB学生网络,从而实现在没有热传感器的情况下也能进行推断。该方法通过混合生成分割监督:使用SAM(Segment Anything Model)引导的掩码生成和TOPSIS选择,以及结合Canny边缘检测与Otsu阈值处理流程自动选择点提示来实现这一目标。 我们的方法首次从RGB UAV数据中实现了每像素温度回归,并在最近发布的FLAME 3数据集上展示了强大的泛化能力。这项工作为轻量级、低成本且无需热传感器的UAV野火监测系统奠定了基础。
https://arxiv.org/abs/2505.01638
Existing edge detection methods often suffer from noise amplification and excessive retention of non-salient details, limiting their applicability in high-precision industrial scenarios. To address these challenges, we propose CAM-EDIT, a novel framework that integrates Channel Attention Mechanism (CAM) and Edge Detection via Independence Testing (EDIT). The CAM module adaptively enhances discriminative edge features through multi-channel fusion, while the EDIT module employs region-wise statistical independence analysis (using Fisher's exact test and chi-square test) to suppress uncorrelated this http URL experiments on BSDS500 and NYUDv2 datasets demonstrate state-of-the-art performance. Among the nine comparison algorithms, the F-measure scores of CAM-EDIT are 0.635 and 0.460, representing improvements of 19.2\% to 26.5\% over traditional methods (Canny, CannySR), and better than the latest learning based methods (TIP2020, MSCNGP). Noise robustness evaluations further reveal a 2.2\% PSNR improvement under Gaussian noise compared to baseline methods. Qualitative results exhibit cleaner edge maps with reduced artifacts, demonstrating its potential for high-precision industrial applications.
现有的边缘检测方法常常会放大噪声,并过度保留不重要的细节,这限制了它们在高精度工业场景中的应用。为了应对这些挑战,我们提出了CAM-EDIT这一新框架,该框架结合了通道注意机制(Channel Attention Mechanism, CAM)和基于独立性测试的边缘检测(Edge Detection via Independence Testing, EDIT)。其中,CAM模块通过多通道融合自适应增强辨识度高的边缘特征,而EDIT模块则利用区域统计独立性分析(采用费舍尔精确检验和卡方检验)来抑制不相关的噪声。在BSDS500和NYUDv2数据集上的实验表明,该框架达到了最先进的性能水平。 与九种对比算法相比,CAM-EDIT的F-measure分数分别为0.635和0.460,在传统方法(如Canny, CannySR)的基础上分别提高了19.2%至26.5%,优于最新的基于学习的方法(如TIP2020, MSCNGP)。噪声鲁棒性评估进一步显示,在高斯噪声下,CAM-EDIT相比基线方法PSNR值提升了2.2%。定性的结果显示,边缘图更清晰且减少了伪影,表明它在高精度工业应用中具有巨大潜力。
https://arxiv.org/abs/2505.01040
Edge detection is crucial in image processing, but existing methods often produce overly detailed edge maps, affecting clarity. Fixed-window statistical testing faces issues like scale mismatch and computational redundancy. To address these, we propose a novel Multi-scale Adaptive Independence Testing-based Edge Detection and Denoising (EDD-MAIT), a Multi-scale Adaptive Statistical Testing-based edge detection and denoising method that integrates a channel attention mechanism with independence testing. A gradient-driven adaptive window strategy adjusts window sizes dynamically, improving detail preservation and noise suppression. EDD-MAIT achieves better robustness, accuracy, and efficiency, outperforming traditional and learning-based methods on BSDS500 and BIPED datasets, with improvements in F-score, MSE, PSNR, and reduced runtime. It also shows robustness against Gaussian noise, generating accurate and clean edge maps in noisy environments.
边缘检测在图像处理中至关重要,但现有方法往往会产生过于详细的边缘图,影响清晰度。固定窗口统计测试面临尺度不匹配和计算冗余等问题。为解决这些问题,我们提出了一种基于多尺度自适应独立性检验的边缘检测与去噪(EDD-MAIT)的新方法。这是一种结合了通道注意力机制和独立性测试的多尺度自适应统计测试基边检测方法。该方法采用梯度驱动的自适应窗口策略动态调整窗口大小,从而提高细节保留能力和噪声抑制能力。 EDD-MAIT在BSDS500和BIPED数据集上表现出更好的鲁棒性、准确性和效率,在F-score、MSE(均方误差)、PSNR(峰值信噪比)等指标上有显著改善,并且运行时间更短。此外,该方法对高斯噪声具有较强的鲁棒性,能够在嘈杂环境中生成精确而干净的边缘图。
https://arxiv.org/abs/2505.01032
Here, we propose Deep CS-TRD, a new automatic algorithm for detecting tree rings in whole cross-sections. It substitutes the edge detection step of CS-TRD by a deep-learning-based approach (U-Net), which allows the application of the method to different image domains: microscopy, scanner or smartphone acquired, and species (Pinus taeda, Gleditsia triachantos and Salix glauca). Additionally, we introduce two publicly available datasets of annotated images to the community. The proposed method outperforms state-of-the-art approaches in macro images (Pinus taeda and Gleditsia triacanthos) while showing slightly lower performance in microscopy images of Salix glauca. To our knowledge, this is the first paper that studies automatic tree ring detection for such different species and acquisition conditions. The dataset and source code are available in this https URL
在这里,我们提出了Deep CS-TRD,这是一种新的用于在整木横截面上自动检测年轮的算法。该算法用基于深度学习的方法(U-Net)替代了CS-TRD中的边缘检测步骤,这使得该方法可以应用于不同的图像领域:包括显微镜、扫描仪或智能手机获取的图像以及不同种类的树木(如Pinus taeda, Gleditsia triacanthos 和 Salix glauca)。此外,我们还向社区引入了两个公开可用的带有标注的图像数据集。所提出的方法在宏观图像(Pinus taeda和Gleditsia triacanthos)上超越了现有技术方法的表现,并且在Salix glauca的显微镜图像上的表现略低一些。据我们所知,这是第一篇研究针对不同种类树木及获取条件下的自动年轮检测的论文。数据集和源代码可在该网址获得:[此链接]
https://arxiv.org/abs/2504.16242
Edge detection has attracted considerable attention thanks to its exceptional ability to enhance performance in downstream computer vision tasks. In recent years, various deep learning methods have been explored for edge detection tasks resulting in a significant performance improvement compared to conventional computer vision algorithms. In neural networks, edge detection tasks require considerably large receptive fields to provide satisfactory performance. In a typical convolutional operation, such a large receptive field can be achieved by utilizing a significant number of consecutive layers, which yields deep network structures. Recently, a Multi-scale Tensorial Summation (MTS) factorization operator was presented, which can achieve very large receptive fields even from the initial layers. In this paper, we propose a novel MTS Dimensional Reduction (MTS-DR) module guided neural network, MTS-DR-Net, for the edge detection task. The MTS-DR-Net uses MTS layers, and corresponding MTS-DR blocks as a new backbone to remove redundant information initially. Such a dimensional reduction module enables the neural network to focus specifically on relevant information (i.e., necessary subspaces). Finally, a weight U-shaped refinement module follows MTS-DR blocks in the MTS-DR-Net. We conducted extensive experiments on two benchmark edge detection datasets: BSDS500 and BIPEDv2 to verify the effectiveness of our model. The implementation of the proposed MTS-DR-Net can be found at this https URL.
边缘检测因其在下游计算机视觉任务中增强性能的卓越能力而引起了极大的关注。近年来,各种深度学习方法被探索用于边缘检测任务,并且与传统计算机视觉算法相比,在这些任务上取得了显著的性能提升。在神经网络中,边缘检测任务需要相当大的感受野才能提供令人满意的性能。在一个典型的卷积操作中,可以通过利用大量连续层来实现这样的大感受野,从而产生深度网络结构。最近,提出了多尺度张量求和(MTS)因子化算子,它可以在初始层就达到非常大的感受野。在本文中,我们为边缘检测任务提出了一种新颖的基于MTS维度减少(MTS-DR)模块引导的神经网络,即MTS-DR-Net。MTS-DR-Net使用MTS层和相应的MTS-DR块作为新的骨干结构来删除冗余信息。这样的维度减少模块使神经网络能够专注于相关信息(例如,必要的子空间)。最后,在MTS-DR-Net中,一个权重U形细化模块跟随在MTS-DR块之后。我们在两个基准边缘检测数据集:BSDS500和BIPEDv2上进行了广泛的实验以验证我们模型的有效性。提出的MTS-DR-Net的实现可以在以下链接找到:[提供链接的位置]。 请注意,最后提到的URL需要用户根据具体情况进行补充或替换为实际可用的链接地址。
https://arxiv.org/abs/2504.15770
This paper presents a comprehensive evaluation framework for image segmentation algorithms, encompassing naive methods, machine learning approaches, and deep learning techniques. We begin by introducing the fundamental concepts and importance of image segmentation, and the role of interactive segmentation in enhancing accuracy. A detailed background theory section explores various segmentation methods, including thresholding, edge detection, region growing, feature extraction, random forests, support vector machines, convolutional neural networks, U-Net, and Mask R-CNN. The implementation and experimental setup are thoroughly described, highlighting three primary approaches: algorithm assisting user, user assisting algorithm, and hybrid methods. Evaluation metrics such as Intersection over Union (IoU), computation time, and user interaction time are employed to measure performance. A comparative analysis presents detailed results, emphasizing the strengths, limitations, and trade-offs of each method. The paper concludes with insights into the practical applicability of these approaches across various scenarios and outlines future work, focusing on expanding datasets, developing more representative approaches, integrating real-time feedback, and exploring weakly supervised and self-supervised learning paradigms to enhance segmentation accuracy and efficiency. Keywords: Image Segmentation, Interactive Segmentation, Machine Learning, Deep Learning, Computer Vision
本文提出了一种全面评估图像分割算法的框架,涵盖了简单方法、机器学习方法和深度学习技术。文章首先介绍了图像分割的基本概念及其重要性,并探讨了交互式分割在提高准确性方面的作用。详细的背景理论部分探索了各种分割方法,包括阈值处理、边缘检测、区域增长、特征提取、随机森林、支持向量机、卷积神经网络、U-Net和Mask R-CNN。 实施与实验设置被详尽描述,重点介绍了三种主要的方法:算法辅助用户、用户辅助算法以及混合方法。采用交并比(IoU)、计算时间和用户交互时间等评估指标来衡量性能表现。比较分析部分详细展示了各种方法的结果,并强调了每种方法的优势、局限性及权衡因素。 文章最后总结了这些方法在不同场景中的实际应用价值,并展望未来工作,重点关注扩大数据集规模、开发更具代表性的方法、整合实时反馈以及探索弱监督和自监督学习范式以提高分割准确性和效率。关键词包括:图像分割、交互式分割、机器学习、深度学习、计算机视觉。
https://arxiv.org/abs/2504.04435
Edge detection remains a fundamental yet challenging task in computer vision, especially under varying illumination, noise, and complex scene conditions. This paper introduces a Hybrid Multi-Stage Learning Framework that integrates Convolutional Neural Network (CNN) feature extraction with a Support Vector Machine (SVM) classifier to improve edge localization and structural accuracy. Unlike conventional end-to-end deep learning models, our approach decouples feature representation and classification stages, enhancing robustness and interpretability. Extensive experiments conducted on benchmark datasets such as BSDS500 and NYUDv2 demonstrate that the proposed framework outperforms traditional edge detectors and even recent learning-based methods in terms of Optimal Dataset Scale (ODS) and Optimal Image Scale (OIS), while maintaining competitive Average Precision (AP). Both qualitative and quantitative results highlight enhanced performance on edge continuity, noise suppression, and perceptual clarity achieved by our method. This work not only bridges classical and deep learning paradigms but also sets a new direction for scalable, interpretable, and high-quality edge detection solutions.
边缘检测仍然是计算机视觉中的一个基本且具有挑战性的任务,尤其是在不同的照明条件、噪声以及复杂场景下。本文提出了一种混合多阶段学习框架,该框架结合了卷积神经网络(CNN)的特征提取与支持向量机(SVM)分类器的功能,以提高边缘定位和结构准确性。不同于传统的端到端深度学习模型,我们的方法解耦了特征表示和分类阶段,从而增强了鲁棒性和可解释性。 在BSDS500和NYUDv2等基准数据集上进行的大量实验表明,所提出的框架在最优数据规模(ODS)和最优图像尺度(OIS)方面超过了传统边缘检测器以及近期的学习方法,并且保持了竞争性的平均精度(AP)。无论是定性还是定量结果都显示出了我们的方法在边缘连续性、噪声抑制及感知清晰度方面的性能提升。 这项工作不仅连接了经典学习与深度学习范式,而且还为可扩展的、解释性强和高质量的边缘检测解决方案开辟了一条新的道路。
https://arxiv.org/abs/2503.21827
The CLIP model has demonstrated significant advancements in aligning visual and language modalities through large-scale pre-training on image-text pairs, enabling strong zero-shot classification and retrieval capabilities on various domains. However, CLIP's training remains computationally intensive, with high demands on both data processing and memory. To address these challenges, recent masking strategies have emerged, focusing on the selective removal of image patches to improve training efficiency. Although effective, these methods often compromise key semantic information, resulting in suboptimal alignment between visual features and text descriptions. In this work, we present a concise yet effective approach called Patch Generation-to-Selection to enhance CLIP's training efficiency while preserving critical semantic content. Our method introduces a gradual masking process in which a small set of candidate patches is first pre-selected as potential mask regions. Then, we apply Sobel edge detection across the entire image to generate an edge mask that prioritizes the retention of the primary object areas. Finally, similarity scores between the candidate mask patches and their neighboring patches are computed, with optimal transport normalization refining the selection process to ensure a balanced similarity matrix. Our approach, CLIP-PGS, sets new state-of-the-art results in zero-shot classification and retrieval tasks, achieving superior performance in robustness evaluation and language compositionality benchmarks.
CLIP模型通过在图像-文本对上进行大规模预训练,在视觉和语言模式的对齐方面取得了显著进展,从而能够在各种领域中实现强大的零样本分类和检索能力。然而,CLIP的训练仍具有较高的计算需求,特别是在数据处理和内存使用方面。为了应对这些挑战,最近出现了一些掩码策略,通过选择性地移除图像补丁来提高训练效率。尽管这些方法有效,但它们通常会牺牲关键的语义信息,导致视觉特征与文本描述之间的对齐效果不佳。 在本文中,我们提出了一种简洁而有效的称为Patch Generation-to-Selection的方法,旨在提升CLIP的训练效率的同时保留重要的语义内容。我们的方法引入了逐步掩码过程,在此过程中,首先从图像中选取一小部分候选补丁作为潜在的掩膜区域。然后,我们在整个图像上应用Sobel边缘检测算法来生成一个边缘掩模,优先保持主要物体区域。最后,计算候选掩模补丁与其邻近补丁之间的相似度分数,并通过最优传输归一化对选择过程进行优化,以确保相似性矩阵的平衡。 我们的方法CLIP-PGS在零样本分类和检索任务中取得了新的最先进成果,在鲁棒性评估和语言组合性基准测试中也表现出优越性能。
https://arxiv.org/abs/2503.17080
This study presents a novel approach for roof detail extraction and vectorization using remote sensing images. Unlike previous geometric-primitive-based methods that rely on the detection of corners, our method focuses on edge detection as the primary mechanism for roof reconstruction, while utilizing geometric relationships to define corners and faces. We adapt the YOLOv8 OBB model, originally designed for rotated object detection, to extract roof edges effectively. Our method demonstrates robustness against noise and occlusion, leading to precise vectorized representations of building roofs. Experiments conducted on the SGA and Melville datasets highlight the method's effectiveness. At the raster level, our model outperforms the state-of-the-art foundation segmentation model (SAM), achieving a mIoU between 0.85 and 1 for most samples and an ovIoU close to 0.97. At the vector level, evaluation using the Hausdorff distance, PolyS metric, and our raster-vector-metric demonstrates significant improvements after polygonization, with a close approximation to the reference data. The method successfully handles diverse roof structures and refines edge gaps, even on complex roof structures of new, excluded from training datasets. Our findings underscore the potential of this approach to address challenges in automatic roof structure vectorization, supporting various applications such as urban terrain reconstruction.
这项研究提出了一种使用遥感图像提取和矢量化屋顶细节的新方法。与以往基于几何原语的方法依赖于角点检测不同,我们的方法主要侧重于边缘检测作为屋顶重建的主要机制,并利用几何关系来定义角点和面。我们对 YOLOv8 OBB 模型进行了改进,该模型最初是为旋转物体检测设计的,以有效地提取屋顶边缘。我们的方法展示了在噪声和遮挡情况下的鲁棒性,从而能够生成精确的建筑屋顶矢量表示。我们在 SGA 和 Melville 数据集上进行的实验突显了这种方法的有效性。 在栅格层面,我们的模型优于最先进的基础分割模型(SAM),大多数样本的 mIoU 在 0.85 到 1 之间,并且 ovIoU 接近 0.97。在矢量层面上,使用 Hausdorff 距离、PolyS 指标以及我们提出的栅格-矢量指标进行评估,在多边形化后表现出显著改进,接近参考数据。 该方法成功处理了各种屋顶结构,并且即使对于从未参与训练的数据集中的复杂屋顶结构也能精炼边缘缺口。我们的研究结果强调了此方法在自动屋顶结构矢量化挑战中应用的潜力,支持诸如城市地形重建等各种应用场景。
https://arxiv.org/abs/2503.09187