Efficient use of cultivated areas is a necessary factor for sustainable development of agriculture and ensuring food security. Along with the rapid development of satellite technologies in developed countries, new methods are being searched for accurate and operational identification of cultivated areas. In this context, identification of cropland boundaries based on spectral analysis of data obtained from satellite images is considered one of the most optimal and accurate methods in modern agriculture. This article proposes a new approach to determine the suitability and green index of cultivated areas using satellite data obtained through the "Google Earth Engine" (GEE) platform. In this approach, two powerful algorithms, "SNIC (Simple Non-Iterative Clustering) Super Pixels" and "Canny Edge Detection Method", are combined. The SNIC algorithm combines pixels in a satellite image into larger regions (super pixels) with similar characteristics, thereby providing better image analysis. The Canny Edge Detection Method detects sharp changes (edges) in the image to determine the precise boundaries of agricultural fields. This study, carried out using high-resolution multispectral data from the Sentinel-2 satellite and the Google Earth Engine JavaScript API, has shown that the proposed method is effective in accurately and reliably classifying randomly selected agricultural fields. The combined use of these two tools allows for more accurate determination of the boundaries of agricultural fields by minimizing the effects of outliers in satellite images. As a result, more accurate and reliable maps can be created for agricultural monitoring and resource management over large areas based on the obtained data. By expanding the application capabilities of cloud-based platforms and artificial intelligence methods in the agricultural field.
高效利用耕种区域是农业可持续发展和保障粮食安全的一个重要因素。随着发达国家卫星技术的迅速发展,人们正在寻找准确且操作性强的方式来识别耕地。在这种背景下,基于从卫星图像获取的数据进行光谱分析以确定耕地边界的方法被认为是现代农业中最为优化和精确的方法之一。本文提出了一种新方法,利用通过“Google Earth Engine”(GEE)平台获得的卫星数据来确定耕种区域的适宜性和绿色指数。该方法结合了两个强大的算法:“SNIC(Simple Non-Iterative Clustering)超像素”算法以及“Canny边缘检测法”。 SNIC算法将卫星图像中的像素组合成具有类似特性的较大区域(即超像素),从而提供更好的图像分析能力。而Canny边缘检测法则用于识别图像中急剧变化的边界,以确定农业用地的确切边界。这项研究使用了来自Sentinel-2卫星的高分辨率多光谱数据以及Google Earth Engine JavaScript API,并表明所提出的方法在准确且可靠地分类随机选择的农田方面非常有效。 通过结合这两种工具的使用,可以更精确地确定农业土地的边界,减少卫星图像中异常值的影响。基于获得的数据,可以在大面积范围内创建更加准确和可靠的农业监测与资源管理地图。这扩展了云计算平台及人工智能方法在农业领域的应用能力。
https://arxiv.org/abs/2502.04529
Edge detection is a cornerstone of image processing, yet existing methods often face critical limitations. Traditional deep learning edge detection methods require extensive training datasets and fine-tuning, while classical techniques often fail in complex or noisy scenarios, limiting their real-world applicability. To address these limitations, we propose a training-free, quantum-inspired edge detection model. Our approach integrates classical Sobel edge detection, the Schrödinger wave equation refinement, and a hybrid framework combining Canny and Laplacian operators. By eliminating the need for training, the model is lightweight and adaptable to diverse applications. The Schrödinger wave equation refines gradient-based edge maps through iterative diffusion, significantly enhancing edge precision. The hybrid framework further strengthens the model by synergistically combining local and global features, ensuring robustness even under challenging conditions. Extensive evaluations on datasets like BIPED, Multicue, and NYUD demonstrate superior performance of the proposed model, achieving state-of-the-art metrics, including ODS, OIS, AP, and F-measure. Noise robustness experiments highlight its reliability, showcasing its practicality for real-world scenarios. Due to its versatile and adaptable nature, our model is well-suited for applications such as medical imaging, autonomous systems, and environmental monitoring, setting a new benchmark for edge detection.
边缘检测是图像处理中的核心环节,然而现有的方法往往面临着关键的局限性。传统的深度学习边缘检测方法需要大量的训练数据集和精细调整,而经典技术在复杂或噪声环境中常常表现不佳,限制了其实际应用效果。为了克服这些局限性,我们提出了一种无需训练、受量子启发的边缘检测模型。我们的方法结合了经典的Sobel边缘检测算法、薛定谔波动方程细化以及将Canny和Laplacian算子相结合的混合框架。 通过消除对训练的需求,该模型变得轻量且易于适应各种应用场景。薛定谔波动方程通过对基于梯度的边缘图进行迭代扩散来精炼其细节,显著提高了边缘检测的精确度。混合框架进一步增强了模型,通过协同结合局部和全局特征,在挑战性条件下确保了其鲁棒性。 在BIPED、Multicue和NYUD等数据集上的广泛评估显示,所提出的模型实现了卓越的表现,并达到了包括ODS(Object Detection Score)、OIS(Object Integrity Score)、AP(Average Precision)以及F-measure在内的各项指标的最新水平。噪声鲁棒性的实验强调了该模型的可靠性,展示了其在实际场景中的实用性。 鉴于其灵活和适应性强的特点,我们的模型非常适合医疗成像、自主系统和环境监测等应用领域,并为边缘检测设定了新的基准标准。
https://arxiv.org/abs/2501.18929
Asteroid exploration is a pertinent challenge due to the varying complexity of their dynamical environments, shape and communication delays due to distance. Thus, autonomous navigation methods are continually being developed and improved in current research to enable their safe exploration. These methods often involve using horizon-based Optical Navigation (OpNav) to determine the spacecraft's location, which is reliant on the visibility of the horizon. It is critical to ensure the reliability of this measurement such that the spacecraft may maintain an accurate state estimate throughout its mission. This paper presents an algorithm that generates control maneuvers for spacecraft to follow trajectories that allow continuously usable optical measurements to maintain system observability for safe navigation. This algorithm improves upon existing asteroid navigation capabilities by allowing the safe and robust autonomous targeting of various trajectories and orbits at a wide range of distances within optical measurement range. It is adaptable to different asteroid scenarios. Overall, the approach develops an all-encompassing system that simulates the asteroid dynamics, synthetic image generation, edge detection, horizon-based OpNav, filtering and observability-enhancing control.
小行星探测面临的一大挑战在于它们动态环境的复杂性、形状各异以及由于距离造成的通信延迟。因此,目前的研究不断开发和改进自主导航方法,以确保安全地进行探测。这些方法通常涉及使用基于地平线的光学导航(OpNav)来确定航天器的位置,这种方法依赖于地平线的可见度。为了保证这一测量的可靠性并使航天器在整个任务期间能够保持准确的状态估计,至关重要的是要开发出一种算法:该算法生成控制指令,以引导航天器沿着允许持续进行光学测量的轨迹行进,从而维持系统的可观测性,实现安全导航。 本文提出了一种算法,它通过设计航天器可以遵循的轨迹来确保连续可用的光学测量数据,进而保持系统可观测性,以实现安全导航。这种算法改进了现有的小行星导航能力,使在广泛的光度测量范围内进行安全且稳健的自主目标定位成为可能,并适用于各种不同的小行星情况。 总的来说,该方法建立了一个全面的系统,模拟小行星的动力学特性、合成图像生成、边缘检测、基于地平线的光学导航、滤波以及增强可观测性的控制策略。
https://arxiv.org/abs/2501.15806
Snapshot compressive imaging (SCI) is a promising technique for capturing high-speed video at low bandwidth and low power, typically by compressing multiple frames into a single measurement. However, similar to traditional CMOS image sensor based imaging systems, SCI also faces challenges in low-lighting photon-limited and low-signal-to-noise-ratio image conditions. In this paper, we propose a novel Compressive Denoising Autoencoder (CompDAE) using the STFormer architecture as the backbone, to explicitly model noise characteristics and provide computer vision functionalities such as edge detection and depth estimation directly from compressed sensing measurements, while accounting for realistic low-photon conditions. We evaluate the effectiveness of CompDAE across various datasets and demonstrated significant improvements in task performance compared to conventional RGB-based methods. In the case of ultra-low-lighting (APC $\leq$ 20) while conventional methods failed, the proposed algorithm can still maintain competitive performance.
快照压缩成像(SCI)是一种有前景的技术,可在低带宽和低功耗条件下捕捉高速视频,通常通过将多个帧压缩为单个测量值来实现。然而,与传统的基于CMOS图像传感器的成像系统类似,SCI在光线不足、光子数量有限以及信噪比低的图像条件下也面临挑战。本文中,我们提出了一种使用STFormer架构作为骨干的新颖压缩去噪自动编码器(CompDAE),该编码器能够明确地建模噪声特征,并直接从压缩感知测量值提供计算机视觉功能,例如边缘检测和深度估计,在同时考虑现实中的低光子条件时也是如此。我们在多个数据集上评估了CompDAE的有效性,并展示了与传统基于RGB的方法相比在任务性能上的显著改进。即使在极低光照(APC ≤ 20)的情况下,常规方法失效,所提出的算法仍能保持竞争力的性能。
https://arxiv.org/abs/2501.15122
Electrical impedance tomography (EIT) is a non-invasive imaging method for recovering the internal conductivity of a physical body from electric boundary measurements. EIT combined with machine learning has shown promise for the classification of strokes. However, most previous works have used raw EIT voltage data as network inputs. We build upon a recent development which suggested the use of special noise-robust Virtual Hybrid Edge Detection (VHED) functions as network inputs, although that work used only highly simplified and mathematically ideal models. In this work we strengthen the case for the use of EIT, and VHED functions especially, for stroke classification. We design models with high detail and mathematical realism to test the use of VHED functions as inputs. Virtual patients are created using a physically detailed 2D head model which includes features known to create challenges in real-world imaging scenarios. Conductivity values are drawn from statistically realistic distributions, and phantoms are afflicted with either hemorrhagic or ischemic strokes of various shapes and sizes. Simulated noisy EIT electrode data, generated using the realistic Complete Electrode Model (CEM) as opposed to the mathematically ideal continuum model, is processed to obtain VHED functions. We compare the use of VHED functions as inputs against the alternative paradigm of using raw EIT voltages. Our results show that (i) stroke classification can be performed with high accuracy using 2D EIT data from physically detailed and mathematically realistic models, and (ii) in the presence of noise, VHED functions outperform raw data as network inputs.
电气阻抗断层成像(EIT)是一种通过电边界测量来恢复物理体内导电性的非侵入性成像方法。结合机器学习,EIT在中风分类方面显示出巨大潜力。然而,大多数以前的工作都使用原始的EIT电压数据作为网络输入。我们基于最近的一项研究发展而来,该研究表明可以使用特殊的抗噪能力强的虚拟混合边缘检测(VHED)函数作为网络输入,尽管这项工作仅使用了高度简化的数学理想模型。 在这项工作中,我们将进一步论证利用EIT和VHED函数在中风分类中的应用。我们设计具有高细节度和数学现实性的模型来测试VHED函数作为输入的效果。通过一个包含已知会导致实际成像场景挑战的物理详细二维头部模型创建虚拟患者,并从统计上真实分布中抽取导电性值,将这些虚拟人设为患有不同形状和大小的出血或缺血性中风。 我们使用现实中的完整电极模型(CEM)而非数学理想的连续体模型来生成包含噪声的EIT电极数据,并通过该数据获取VHED函数。我们将作为输入使用的VHED函数的效果与使用原始EIT电压的替代方案进行了比较。 我们的结果显示,(i) 使用物理详细且数学现实性的二维EIT数据可以以高精度进行中风分类;(ii) 在存在噪声的情况下,VHED函数相较于原始数据能更好地作为网络输入。
https://arxiv.org/abs/2501.14704
Recent advancements have demonstrated the effectiveness of the extractor-selector (E-S) framework in edge detection (ED) tasks, which achieves state-of-the-art (SOTA) performance in both quantitative metrics and perceptual quality. However, this method still falls short of fully exploiting the potential of feature extractors, as selectors only operate on highly compressed feature maps that lack diversity and suffer from substantial information loss. Additionally, while union training can improve perceptual quality, the highest evaluation scores are typically obtained without it, creating a trade-off between quantitative accuracy and perceptual fidelity. To address these limitations, we propose an enhanced E-S architecture, which utilizes richer, less-loss feature representations and incorporates auxiliary features during the selection process, thereby improving the effectiveness of the feature selection mechanism. Additionally, we introduce a novel loss function, the Symmetrization Weight Binary Cross-Entropy (SWBCE), which simultaneously emphasizes both the recall of edge pixels and the suppression of erroneous edge predictions, thereby enhancing the predictions both in the perceptual quality and the prediction accuracy. The effectiveness and superiority of our approaches over baseline models, the standard E-S framework, and the standard Weight Binary Cross-Entropy (WBCE) loss function are demonstrated by extensive experiments. For example, our enhanced E-S architecture trained with SWBCE loss function achieves average improvements of 8.25$\%$, 8.01$\%$, and 33.25$\%$ in ODS, OIS, and AP, measured on BIPED2 compared with the baseline models, significantly outperforming the standard E-S method. The results set new benchmarks for ED tasks, and highlight the potential of the methods in beyond.
最近的研究进展展示了提取器-选择器(E-S)框架在边缘检测(ED)任务中的有效性,该框架在量化指标和感知质量方面均达到了最先进的性能。然而,此方法仍然未能充分利用特征提取器的全部潜力,因为选择器仅操作于高度压缩且缺乏多样性的特征图上,并因此导致了大量信息损失。此外,尽管联合训练可以提高感知质量,但通常最高评估分数是在没有联合训练的情况下获得的,这就造成了定量准确性与感知保真度之间的权衡。为了解决这些限制,我们提出了一种增强版E-S架构,该架构利用更为丰富、丢失较少的信息特征表示,并在选择过程中加入辅助特征,从而提高了特征选择机制的有效性。此外,我们还引入了一种新的损失函数——对称化权重二元交叉熵(SWBCE),它同时强调了边缘像素的召回率和错误边缘预测的抑制,从而提升了感知质量和预测准确性。 通过广泛的实验,我们的方法在基线模型、标准E-S框架以及标准加权二元交叉熵(WBCE)损失函数方面展示了其有效性和优越性。例如,在BIPED2数据集上与基线模型相比,使用SWBCE损失函数训练的增强版E-S架构分别实现了ODS、OIS和AP指标平均提高了8.25%,8.01% 和33.25%,显著优于标准E-S方法的表现。这些结果为边缘检测任务设立了新的基准,并突显了该方法在更广泛应用中的潜力。
https://arxiv.org/abs/2501.13365
Transformer-based models have made significant progress in edge detection, but their high computational cost is prohibitive. Recently, vision Mamba have shown excellent ability in efficiently capturing long-range dependencies. Drawing inspiration from this, we propose a novel edge detector with Mamba, termed EDMB, to efficiently generate high-quality multi-granularity edges. In EDMB, Mamba is combined with a global-local architecture, therefore it can focus on both global information and fine-grained cues. The fine-grained cues play a crucial role in edge detection, but are usually ignored by ordinary Mamba. We design a novel decoder to construct learnable Gaussian distributions by fusing global features and fine-grained features. And the multi-grained edges are generated by sampling from the distributions. In order to make multi-granularity edges applicable to single-label data, we introduce Evidence Lower Bound loss to supervise the learning of the distributions. On the multi-label dataset BSDS500, our proposed EDMB achieves competitive single-granularity ODS 0.837 and multi-granularity ODS 0.851 without multi-scale test or extra PASCAL-VOC data. Remarkably, EDMB can be extended to single-label datasets such as NYUDv2 and BIPED. The source code is available at this https URL.
基于Transformer的模型在边缘检测方面取得了显著进步,但其高昂的计算成本是一个限制因素。最近的研究显示,视觉Mamba(一种轻量级网络)表现出色,在捕捉长距离依赖关系时非常高效。受此启发,我们提出了一种名为EDMB的新颖边缘探测器,它能够利用Mamba高效生成高质量、多粒度边缘。 在EDMB中,Mamba与全局-局部架构相结合,因此它可以同时关注整体信息和细粒度线索。细粒度线索对于边缘检测至关重要,但在普通Mamba中通常被忽略。我们设计了一种新的解码器来融合全局特征和细粒度特征以构建可学习的高斯分布,并通过从这些分布中采样生成多粒度边缘。 为了使多粒度边缘适用于单标签数据,我们引入了证据下界(Evidence Lower Bound)损失来监督分布的学习过程。在BSDS500这一多标签数据集上,我们的EDMB模型在未使用多尺度测试或额外PASCAL-VOC数据的情况下达到了具有竞争力的单一粒度ODS 0.837和多粒度ODS 0.851。 值得注意的是,EDMB可以扩展应用于NYUDv2和BIPED等单标签数据集。源代码可在以下链接获取:[提供URL](请将实际链接替换为提供的示例中的占位符)。
https://arxiv.org/abs/2501.04846
This paper explores Masked Autoencoders (MAE) with Gaussian Splatting. While reconstructive self-supervised learning frameworks such as MAE learns good semantic abstractions, it is not trained for explicit spatial awareness. Our approach, named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions and spatial understanding jointly. Like MAE, it reconstructs the image end-to-end in the pixel space, but beyond MAE, it also introduces an intermediate, 3D Gaussian-based representation and renders images via splatting. We show that GMAE can enable various zero-shot learning capabilities of spatial understanding (e.g., figure-ground segmentation, image layering, edge detection, etc.) while preserving the high-level semantics of self-supervised representation quality from MAE. To our knowledge, we are the first to employ Gaussian primitives in an image representation learning framework beyond optimization-based single-scene reconstructions. We believe GMAE will inspire further research in this direction and contribute to developing next-generation techniques for modeling high-fidelity visual data. More details at this https URL
本文探讨了使用高斯点绘的遮蔽自编码器(Masked Autoencoders,简称MAE)技术。虽然像MAE这样的重建式自监督学习框架能够学到良好的语义抽象,但它并没有针对明确的空间感知进行训练。我们的方法,名为高斯遮蔽自编码器或GMAE,旨在同时学习语义抽象和空间理解。与MAE类似,它在像素空间中端到端地重构图像,但除此之外,还引入了一个基于3D高斯的中间表示,并通过点绘来渲染图像。我们展示了GMAE可以实现各种零样本学习的空间理解能力(例如前景-背景分割、图像分层、边缘检测等),同时保持了来自MAE的自监督表示质量的高级语义信息。据我们所知,这是首次在基于优化方法的单场景重构之外,在图像表示学习框架中使用高斯原语。我们认为GMAE将激发进一步的研究,并为开发用于建模高质量视觉数据的下一代技术做出贡献。更多详情请访问此链接:[https URL](原文中的URL被替换成了占位符)
https://arxiv.org/abs/2501.03229
Although deep convolutional neutral networks (CNNs) have significantly enhanced performance in image edge detection (ED), current models remain highly dependent on post-processing techniques such as non-maximum suppression (NMS), and often fail to deliver satisfactory perceptual results, while the performance will deteriorate significantly if the allowed error toleration distance decreases. These limitations arise from the uniform fusion of features across all pixels, regardless of their specific characteristics, such as the distinction between textural and edge areas. If the features extracted by the ED models are selected more meticulously and encompass greater diversity, the resulting predictions are expected to be more accurate and perceptually meaningful. Motivated by this observation, this paper proposes a novel feature selection paradigm for deep networks that facilitates the differential selection of features and can be seamlessly integrated into existing ED models. By incorporating this additional structure, the performance of conventional ED models is substantially enhanced without post-processing, while simultaneously enhancing the perceptual quality of the predictions. Extensive experimental evaluations validate the effectiveness of the proposed model.
尽管深度卷积神经网络(CNN)在图像边缘检测(ED)方面显著提升了性能,但目前的模型仍然高度依赖于非极大值抑制(NMS)等后处理技术,并且常常无法提供令人满意的感知结果。如果允许的误差容忍距离减小,其性能会明显下降。这些限制源于对所有像素特征进行统一融合,而没有考虑到它们的具体特性,例如纹理区域和边缘区域之间的区别。如果通过ED模型提取的特征被更精心地选择并包含更大的多样性,则预测的结果有望更加准确且具有感知意义。 受此观察结果启发,本文提出了一种新颖的深度网络特征选择范式,该范式可以实现差异化特征的选择,并能够无缝集成到现有的边缘检测模型中。通过引入这种额外的结构,在不进行后处理的情况下显著提升了传统ED模型的性能,并同时提高了预测的质量感知度。广泛的实验评估验证了所提出的模型的有效性。
https://arxiv.org/abs/2501.02534
Knowledge distillation has been successfully applied to various audio tasks, but its potential in underwater passive sonar target classification remains relatively unexplored. Existing methods often focus on high-level contextual information while overlooking essential low-level audio texture features needed to capture local patterns in sonar data. To address this gap, the Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) framework is proposed for passive sonar target classification. SSATKD combines high-level contextual information with low-level audio textures by utilizing an Edge Detection Module for structural texture extraction and a Statistical Knowledge Extractor Module to capture signal variability and distribution. Experimental results confirm that SSATKD improves classification accuracy while optimizing memory and computational resources, making it well-suited for resource-constrained environments.
知识蒸馏技术已经成功应用于各种音频任务,但在水下被动声纳目标分类领域的应用潜力尚有待开发。现有方法往往侧重于高层次的上下文信息,而忽视了捕捉声纳数据中局部模式所必需的基本低层次音频纹理特征。为解决这一问题,提出了结构和统计音频纹理知识蒸馏(SSATKD)框架用于被动声纳目标分类。SSATKD通过利用边缘检测模块提取结构化纹理,并采用统计知识抽取模块捕获信号的变化性和分布特性,将高层次的上下文信息与低层次的音频纹理相结合。实验结果证实,SSATKD在提高分类准确性的同时还能优化内存和计算资源,使其非常适合于资源受限的环境。
https://arxiv.org/abs/2501.01921
Edge computing has emerged as a key paradigm for deploying deep learning-based object detection in time-sensitive scenarios. However, existing edge detection methods face challenges: 1) difficulty balancing detection precision with lightweight models, 2) limited adaptability of generalized deployment designs, and 3) insufficient real-world validation. To address these issues, we propose the Edge Detection Toolbox (ED-TOOLBOX), which utilizes generalizable plug-and-play components to adapt object detection models for edge environments. Specifically, we introduce a lightweight Reparameterized Dynamic Convolutional Network (Rep-DConvNet) featuring weighted multi-shape convolutional branches to enhance detection performance. Additionally, we design a Sparse Cross-Attention (SC-A) network with a localized-mapping-assisted self-attention mechanism, enabling a well-crafted joint module for adaptive feature transfer. For real-world applications, we incorporate an Efficient Head into the YOLO framework to accelerate edge model optimization. To demonstrate practical impact, we identify a gap in helmet detection -- overlooking band fastening, a critical safety factor -- and create the Helmet Band Detection Dataset (HBDD). Using ED-TOOLBOX-optimized models, we address this real-world task. Extensive experiments validate the effectiveness of ED-TOOLBOX, with edge detection models outperforming six state-of-the-art methods in visual surveillance simulations, achieving real-time and accurate performance. These results highlight ED-TOOLBOX as a superior solution for edge object detection.
边缘计算已成为在时间敏感场景中部署基于深度学习的对象检测的关键范式。然而,现有的边缘检测方法面临挑战:1)难以平衡轻量级模型的检测精度;2)通用部署设计的适应性有限;3)现实世界验证不足。为了解决这些问题,我们提出了边缘检测工具箱(ED-TOOLBOX),该工具箱利用可泛化的即插即用组件将对象检测模型适配到边缘环境。具体来说,我们引入了一种轻量级的重新参数化动态卷积网络(Rep-DConvNet),其特点是带权重的多形状卷积分支,以增强检测性能。此外,我们设计了一个稀疏交叉注意力(SC-A)网络,配备了局部映射辅助自注意力机制,实现了一个精心设计的联合模块进行自适应特征传输。对于实际应用,我们将高效头部集成到YOLO框架中,以加速边缘模型优化。为了展示其实际影响,我们在头盔检测领域发现了一个缺口——忽视了系带紧固这一关键安全因素,并创建了头盔系带检测数据集(HBDD)。利用经过ED-TOOLBOX优化的模型,我们解决了这个现实世界任务。广泛的实验验证了ED-TOOLBOX的有效性,在视觉监控仿真中,边缘检测模型优于六种最先进的方法,实现了实时和准确的性能。这些结果强调了ED-TOOLBOX作为边缘对象检测卓越解决方案的地位。
https://arxiv.org/abs/2412.18230
Early detection of illnesses and pest infestations in fruit cultivation is critical for maintaining yield quality and plant health. Computer vision and robotics are increasingly employed for the automatic detection of such issues, particularly using data-driven solutions. However, the rarity of these problems makes acquiring and processing the necessary data to train such algorithms a significant obstacle. One solution to this scarcity is the generation of synthetic high-quality anomalous samples. While numerous methods exist for this task, most require highly trained individuals for setup. This work addresses the challenge of generating synthetic anomalies in an automatic fashion that requires only an initial collection of normal and anomalous samples from the user - a task that is straightforward for farmers. We demonstrate the approach in the context of table grape cultivation. Specifically, based on the observation that normal berries present relatively smooth surfaces, while defects result in more complex textures, we introduce a Dual-Canny Edge Detection (DCED) filter. This filter emphasizes the additional texture indicative of diseases, pest infestations, or other defects. Using segmentation masks provided by the Segment Anything Model, we then select and seamlessly blend anomalous berries onto normal ones. We show that the proposed dataset augmentation technique improves the accuracy of an anomaly classifier for table grapes and that the approach can be generalized to other fruit types.
疾病和害虫侵袭在水果栽培中的早期检测对于保持产量质量和植物健康至关重要。计算机视觉和机器人技术越来越多地被用于自动检测这些问题,特别是使用基于数据的解决方案。然而,这类问题的罕见性使得收集和处理训练这些算法所需的数据成为一个重大障碍。解决这一稀缺性的方法之一是生成合成的高质量异常样本。尽管有许多方法可以完成这个任务,但大多数方法需要高度训练的人来设置。本研究解决了自动生成合成异常的问题,只需要用户初始采集正常样本和异常样本——这对农民来说是一项简单的任务。我们在葡萄栽培的背景下演示了这种方法。具体而言,基于观察到正常浆果表面相对光滑而缺陷导致更复杂的纹理这一事实,我们引入了一种双坎尼边缘检测(DCED)滤波器。该滤波器强调了疾病、害虫侵袭或其他缺陷所指示的额外纹理。使用Segment Anything Model提供的分割掩模,然后选择并无缝地将异常浆果融合到正常浆果上。我们表明,建议的数据集增强技术提高了葡萄异常分类器的准确性,并且该方法可以推广到其他水果类型。
https://arxiv.org/abs/2412.12949
Edge labels are typically at various granularity levels owing to the varying preferences of annotators, thus handling the subjectivity of per-pixel labels has been a focal point for edge detection. Previous methods often employ a simple voting strategy to diminish such label uncertainty or impose a strong assumption of labels with a pre-defined distribution, e.g., Gaussian. In this work, we unveil that the segment anything model (SAM) provides strong prior knowledge to model the uncertainty in edge labels. Our key insight is that the intermediate SAM features inherently correspond to object edges at various granularities, which reflects different edge options due to uncertainty. Therefore, we attempt to align uncertainty with granularity by regressing intermediate SAM features from different layers to object edges at multi-granularity levels. In doing so, the model can fully and explicitly explore diverse ``uncertainties'' in a data-driven fashion. Specifically, we inject a lightweight module (~ 1.5% additional parameters) into the frozen SAM to progressively fuse and adapt its intermediate features to estimate edges from coarse to fine. It is crucial to normalize the granularity level of human edge labels to match their innate uncertainty. For this, we simply perform linear blending to the real edge labels at hand to create pseudo labels with varying granularities. Consequently, our uncertainty-aligned edge detector can flexibly produce edges at any desired granularity (including an optimal one). Thanks to SAM, our model uniquely demonstrates strong generalizability for cross-dataset edge detection. Extensive experimental results on BSDS500, Muticue and NYUDv2 validate our model's superiority.
边缘标签通常因注释者的不同偏好而处于不同的粒度级别,因此处理像素级标签的主观性一直是边缘检测的重点。以前的方法经常采用简单的投票策略来减少这种标签不确定性或假设预定义分布(例如高斯分布)下的强标签假设。在这项工作中,我们揭示了“一切皆可分割”模型(SAM)为建模边缘标签中的不确定性提供了强有力的前提知识。我们的关键见解是,中间的SAM特征本质上对应于不同粒度级别的对象边缘,这反映了由于不确定性而产生的不同的边缘选项。因此,我们试图通过从不同层回归中间SAM特征到多粒度级别的对象边缘来使不确定性与粒度对齐。这样,模型可以全面且明确地以数据驱动的方式探索各种“不确定性”。具体来说,我们在冻结的SAM中注入了一个轻量级模块(约增加1.5%参数),逐步融合并适应其中间特征,从而从粗到细估计边缘。重要的是要标准化人类边缘标签的粒度级别以匹配它们天生的不确定性。为此,我们简单地对现有的实际边缘标签进行线性混合,生成具有不同粒度级别的伪标签。因此,我们的不确定性对齐边缘检测器可以灵活地产生任何所需粒度(包括最优粒度)的边缘。得益于SAM,我们的模型在跨数据集边缘检测中展示出独特的强泛化能力。在BSDS500、Muticue和NYUDv2上的广泛实验结果验证了我们模型的优势。
https://arxiv.org/abs/2412.12892
Digital agents are increasingly employed to automate tasks in interactive digital environments such as web pages, software applications, and operating systems. While text-based agents built on Large Language Models (LLMs) often require frequent updates due to platform-specific APIs, visual agents leveraging Multimodal Large Language Models (MLLMs) offer enhanced adaptability by interacting directly with Graphical User Interfaces (GUIs). However, these agents face significant challenges in visual perception, particularly when handling high-resolution, visually complex digital environments. This paper introduces Iris, a foundational visual agent that addresses these challenges through two key innovations: Information-Sensitive Cropping (ISC) and Self-Refining Dual Learning (SRDL). ISC dynamically identifies and prioritizes visually dense regions using a edge detection algorithm, enabling efficient processing by allocating more computational resources to areas with higher information density. SRDL enhances the agent's ability to handle complex tasks by leveraging a dual-learning loop, where improvements in referring (describing UI elements) reinforce grounding (locating elements) and vice versa, all without requiring additional annotated data. Empirical evaluations demonstrate that Iris achieves state-of-the-art performance across multiple benchmarks with only 850K GUI annotations, outperforming methods using 10x more training data. These improvements further translate to significant gains in both web and OS agent downstream tasks.
数字代理越来越多地被用于自动化交互式数字环境中的任务,如网页、软件应用程序和操作系统。虽然基于大型语言模型(LLMs)的文本型代理通常需要频繁更新以适应平台特定的API,但利用多模态大型语言模型(MLLMs)的视觉代理通过直接与图形用户界面(GUIs)交互提供了增强的适应性。然而,这些代理在视觉感知方面面临着重大挑战,尤其是在处理高分辨率和视觉复杂的数字环境时。本文介绍了Iris,这是一种基础性的视觉代理,它通过两项关键创新来解决这些问题:信息敏感裁剪(ISC)和自我精炼双重学习(SRDL)。ISC使用边缘检测算法动态识别并优先处理视觉密集区域,通过对信息密度较高的区域分配更多的计算资源以实现高效处理。SRDL利用一个双循环学习过程增强代理处理复杂任务的能力,在这个过程中,指代改进(描述UI元素)会加强定位(找到这些元素),反之亦然,并且无需额外的标注数据。实证评估表明,Iris仅使用85万GUI注释就实现了跨多个基准测试的最佳性能,超过了使用多10倍训练数据的方法的表现。这些改进进一步转化为网页和操作系统代理下游任务的重要提升。
https://arxiv.org/abs/2412.10342
Spinal ligaments are crucial elements in the complex biomechanical simulation models as they transfer forces on the bony structure, guide and limit movements and stabilize the spine. The spinal ligaments encompass seven major groups being responsible for maintaining functional interrelationships among the other spinal components. Determination of the ligament origin and insertion points on the 3D vertebrae models is an essential step in building accurate and complex spine biomechanical models. In our paper, we propose a pipeline that is able to detect 66 spinal ligament attachment points by using a step-wise approach. Our method incorporates a fast vertebra registration that strategically extracts only 15 3D points to compute the transformation, and edge detection for a precise projection of the registered ligaments onto any given patient-specific vertebra model. Our method shows high accuracy, particularly in identifying landmarks on the anterior part of the vertebra with an average distance of 2.24 mm for anterior longitudinal ligament and 1.26 mm for posterior longitudinal ligament landmarks. The landmark detection requires approximately 3.0 seconds per vertebra, providing a substantial improvement over existing methods. Clinical relevance: using the proposed method, the required landmarks that represent origin and insertion points for forces in the biomechanical spine models can be localized automatically in an accurate and time-efficient manner.
脊柱韧带是复杂生物力学模拟模型中的关键元素,因为它们传递骨骼结构上的力、引导和限制运动并稳定脊柱。脊柱韧带包括七个主要组群,负责维持与其他脊柱组件的功能性相互关系。确定3D椎骨模型上韧带的起点和止点是一个构建准确且复杂的脊柱生物力学模型的关键步骤。在我们的论文中,我们提出了一种能够通过逐步方法检测出66个脊柱韧带附着点的流水线。我们的方法结合了快速椎体配准,该配准策略性地仅提取15个3D点来计算变换,并使用边缘检测技术以精确投影注册后的韧带到任何给定的患者特异性椎骨模型上。我们的方法显示出高精度,特别是在识别椎骨前部标志点方面,平均距离为2.24毫米(对于前纵韧带)和1.26毫米(对于后纵韧带)。每个椎体的地标检测大约需要3.0秒,与现有方法相比提供了显著改进。临床相关性:使用我们提出的方法可以自动、准确且高效地定位代表生物力学脊柱模型中力的作用起点和止点所需的地标。
https://arxiv.org/abs/2412.05081
Adversarial input image perturbation attacks have emerged as a significant threat to machine learning algorithms, particularly in image classification setting. These attacks involve subtle perturbations to input images that cause neural networks to misclassify the input images, even though the images remain easily recognizable to humans. One critical area where adversarial attacks have been demonstrated is in automotive systems where traffic sign classification and recognition is critical, and where misclassified images can cause autonomous systems to take wrong actions. This work presents a new class of adversarial attacks. Unlike existing work that has focused on adversarial perturbations that leverage human-made artifacts to cause the perturbations, such as adding stickers, paint, or shining flashlights at traffic signs, this work leverages nature-made artifacts: tree leaves. By leveraging nature-made artifacts, the new class of attacks has plausible deniability: a fall leaf stuck to a street sign could come from a near-by tree, rather than be placed there by an malicious human attacker. To evaluate the new class of the adversarial input image perturbation attacks, this work analyses how fall leaves can cause misclassification in street signs. The work evaluates various leaves from different species of trees, and considers various parameters such as size, color due to tree leaf type, and rotation. The work demonstrates high success rate for misclassification. The work also explores the correlation between successful attacks and how they affect the edge detection, which is critical in many image classification algorithms.
https://arxiv.org/abs/2411.18776
Stereotactic Body Radiation Therapy (SBRT) can be a precise, minimally invasive treatment method for liver cancer and liver metastases. However, the effectiveness of SBRT relies on the accurate delivery of the dose to the tumor while sparing healthy tissue. Challenges persist in ensuring breath-hold reproducibility, with current methods often requiring manual verification of liver dome positions from kV-triggered images. To address this, we propose a proof-of-principle study of a deep learning-based pipeline to automatically delineate the liver dome from kV-planar images. From 24 patients who received SBRT for liver cancer or metastasis inside liver, 711 KV-triggered images acquired for online breath-hold verification were included in the current study. We developed a pipeline comprising a trained U-Net for automatic liver dome region segmentation from the triggered images followed by extraction of the liver dome via thresholding, edge detection, and morphological operations. The performance and generalizability of the pipeline was evaluated using 2-fold cross validation. The training of the U-Net model for liver region segmentation took under 30 minutes and the automatic delineation of a liver dome for any triggered image took less than one second. The RMSE and rate of detection for Fold1 with 366 images was (6.4 +/- 1.6) mm and 91.7%, respectively. For Fold2 with 345 images, the RMSE and rate of detection was (7.7 +/- 2.3) mm and 76.3% respectively.
立体定向体部放射治疗(SBRT)可以作为一种精确、微创的肝脏癌和肝转移瘤治疗方法。然而,SBRT的有效性依赖于准确地将剂量传递给肿瘤同时保护健康组织。确保呼吸保持可重复性的挑战仍然存在,当前的方法通常需要手动验证kV触发图像中的肝脏穹顶位置。为了解决这一问题,我们提出了一项基于深度学习的管道原理证明研究,以自动从kV平面图像中勾画肝脏穹顶区域。在本研究中,纳入了24名接受SBRT治疗肝癌或肝转移瘤患者的711张用于在线呼吸保持验证的KV触发图像。我们开发了一个流程,包括使用训练过的U-Net模型对触发图像中的肝脏区域进行自动分割,随后通过阈值处理、边缘检测和形态学操作提取肝脏穹顶。利用两折交叉验证评估了该流程的表现和泛化能力。用于肝脏区域分割的U-Net模型训练时间不到30分钟,任何触发图像的肝脏穹顶自动化勾画在1秒内完成。对于包含366张图像的第一折,RMSE(均方根误差)为(6.4 ± 1.6)毫米,检测率为91.7%;而对于包含345张图像的第二折,RMSE为(7.7 ± 2.3)毫米,检测率为76.3%。
https://arxiv.org/abs/2411.15322
Edge detection has been one of the most difficult challenges in computer vision because of the difficulty in identifying the borders and edges from the real-world images including objects of varying kinds and sizes. Methods based on ensemble learning, which use a combination of backbones and attention modules, outperformed more conventional approaches, such as Sobel and Canny edge detection. Nevertheless, these algorithms are still challenged when faced with complicated scene photos. In addition, the identified edges utilizing the current methods are not refined and often include incorrect edges. In this work, we used a Cascaded Ensemble Canny operator to solve these problems and detect the object edges. The most difficult Fresh and Rotten and Berkeley datasets are used to test the suggested approach in Python. In terms of performance metrics and output picture quality, the acquired results outperform the specified edge detection networks
边缘检测一直是计算机视觉领域中最难的挑战之一,因为从包含各种类型和大小物体的真实世界图像中识别边界和边缘非常困难。基于集成学习的方法,通过结合不同的骨干网络和注意力模块,已经超越了传统的如Sobel和Canny边缘检测方法。尽管如此,这些算法在面对复杂场景的照片时仍然面临挑战。此外,使用当前方法识别的边缘不够精确,通常包含错误的边缘。在这项工作中,我们采用了一种级联集成Canny算子来解决这些问题,并用于检测物体边缘。最具有挑战性的Fresh和Rotten以及Berkeley数据集被用来在Python中测试所提出的算法。从性能指标和输出图像质量来看,所得结果优于指定的边缘检测网络。
https://arxiv.org/abs/2411.14868
The digitization of complex technical systems, such as Piping and Instrumentation Diagrams (P&IDs), is crucial for efficient maintenance and operation of complex systems in hydraulic and process engineering. Previous approaches often rely on separate modules that analyze diagram elements individually, neglecting the diagram's overall structure. We address this limitation by proposing a novel approach that utilizes the Relationformer, a state-of-the-art deep learning architecture, to extract graphs from P&IDs. Our method leverages the ability of the Relationformer to simultaneously detect objects and their relationships in images, making it suitable for the task of graph extraction from engineering diagrams. We apply our proposed approach to both real-world and synthetically created P&ID datasets, and evaluate its effectiveness by comparing it with a modular digitization approach based on recent literature. We present PID2Graph, the first publicly accessible P&ID dataset featuring comprehensive labels for the graph structure, including symbols, nodes and their connections that is used for evaluation. To understand the effect of patching and stitching of both of the approaches, we compare values before and after merging the patches. For the real-world data, the Relationformer achieves convincing results, outperforming the modular digitization approach for edge detection by more than 25%. Our work provides a comprehensive framework for assessing the performance of P&ID digitization methods and opens up new avenues for research in this area using transformer architectures. The P&ID dataset used for evaluation will be published and publicly available upon acceptance of the paper.
复杂技术系统的数字化,例如管道和仪表图(P&IDs),对于液压和过程工程中复杂系统的有效维护和操作至关重要。以往的方法通常依赖于单独的模块来分别分析图中的元素,忽略了整个图表的结构。我们提出了一种新的方法来解决这一局限性,该方法使用 Relationformer——一种最先进的深度学习架构——从 P&IDs 中提取图形。我们的方法利用了 Relationformer 同时检测图像中对象及其关系的能力,使其适合于从工程图纸中进行图结构抽取的任务。我们将此提案应用于实际和合成生成的 P&ID 数据集,并通过将其与基于近期文献的模块化数字化方法进行比较来评估其有效性。我们介绍了 PID2Graph——首个公开访问的带有完整标签的 P&ID 数据集,包括符号、节点及其连接关系——用于此次评价。为了理解两种方法在拼接和缝合方面的效果,我们将合并前后的值进行了对比。对于实际数据,Relationformer 达到了令人信服的结果,在边缘检测方面比模块化数字化方法高出25%以上。我们的工作提供了一个全面的框架来评估 P&ID 数字化方法的表现,并开启了利用变换器架构在此领域进行研究的新途径。用于评价的数据集将在论文被接受后发布并公开获取。
https://arxiv.org/abs/2411.13929
Shadow removal and segmentation remain challenging tasks in computer vision, particularly in complex real-world scenarios. This study presents a novel approach that enhances the ShadowFormer model by incorporating Masked Autoencoder (MAE) priors and Fast Fourier Convolution (FFC) blocks, leading to significantly faster convergence and improved performance. We introduce key innovations: (1) integration of MAE priors trained on Places2 dataset for better context understanding, (2) adoption of Haar wavelet features for enhanced edge detection and multi-scale analysis, and (3) implementation of a modified SAM Adapter for robust shadow segmentation. Extensive experiments on the challenging DESOBA dataset demonstrate that our approach achieves state-of-the-art results, with notable improvements in both convergence speed and shadow removal quality.
阴影移除和分割在计算机视觉中仍然是具有挑战性的任务,特别是在复杂的现实世界场景中。本研究提出了一种新颖的方法,通过整合掩码自编码器(MAE)先验和快速傅里叶卷积(FFC)块来增强ShadowFormer模型,从而显著加快收敛速度并提升性能。我们介绍了关键创新点:(1) 集成在Places2数据集上训练的MAE先验以更好地理解上下文信息;(2) 采用Haar小波特征以增强边缘检测和多尺度分析能力;以及(3) 实现改进后的SAM适配器,以实现更为稳健的阴影分割。在具有挑战性的DESOBA数据集上的广泛实验表明,我们的方法达到了最先进的结果,在收敛速度和阴影移除质量上均有显著提升。
https://arxiv.org/abs/2411.05747