Efficient polyp segmentation in healthcare plays a critical role in enabling early diagnosis of colorectal cancer. However, the segmentation of polyps presents numerous challenges, including the intricate distribution of backgrounds, variations in polyp sizes and shapes, and indistinct boundaries. Defining the boundary between the foreground (i.e. polyp itself) and the background (surrounding tissue) is difficult. To mitigate these challenges, we propose Multi-Scale Edge-Guided Attention Network (MEGANet) tailored specifically for polyp segmentation within colonoscopy images. This network draws inspiration from the fusion of a classical edge detection technique with an attention mechanism. By combining these techniques, MEGANet effectively preserves high-frequency information, notably edges and boundaries, which tend to erode as neural networks deepen. MEGANet is designed as an end-to-end framework, encompassing three key modules: an encoder, which is responsible for capturing and abstracting the features from the input image, a decoder, which focuses on salient features, and the Edge-Guided Attention module (EGA) that employs the Laplacian Operator to accentuate polyp boundaries. Extensive experiments, both qualitative and quantitative, on five benchmark datasets, demonstrate that our EGANet outperforms other existing SOTA methods under six evaluation metrics. Our code is available at \url{this https URL}
在医疗保健中,有效的 polyp 分割在早期诊断 colorectal cancer 中发挥着关键作用。然而, polyp 分割面临着许多挑战,包括背景的精细分布、 polyp 大小和形状的变化以及模糊的边界。定义前景(即 polyp 本身)和背景(周围组织)之间的边界并不容易。为了缓解这些挑战,我们提出了 Multi-Scale Edge-Guided Attention Network (MEGANet) 专门为 colonoscopies 图像中的 polyp 分割而设计。该网络从经典的边缘检测技术和注意力机制的融合中汲取灵感。通过结合这些技术,MegaNet 有效地保留了高频信息,特别是边缘和边界,这些边缘通常会随着神经网络的深度而磨损。MegaNet 设计为一个完整的端到端框架,包括三个关键模块:编码器,负责从输入图像中提取和抽象特征;解码器,专注于引人注目的特征;以及 Edge-Guided Attention module (EGA),采用高斯函数操作来加强 polyp 边界。在五个基准数据集上进行了大量的定性和定量实验,证明了我们的 EGANet 在六个评估指标上优于其他现有 SOTA 方法。我们的代码可供参考于 \url{this https URL}。
https://arxiv.org/abs/2309.03329
This paper presents Deep Networks for Improved Segmentation Edges (DeNISE), a novel data enhancement technique using edge detection and segmentation models to improve the boundary quality of segmentation masks. DeNISE utilizes the inherent differences in two sequential deep neural architectures to improve the accuracy of the predicted segmentation edge. DeNISE applies to all types of neural networks and is not trained end-to-end, allowing rapid experiments to discover which models complement each other. We test and apply DeNISE for building segmentation in aerial images. Aerial images are known for difficult conditions as they have a low resolution with optical noise, such as reflections, shadows, and visual obstructions. Overall the paper demonstrates the potential for DeNISE. Using the technique, we improve the baseline results with a building IoU of 78.9%.
本论文介绍了改进分割边缘的深度学习方法(DeNISE),一种利用边缘检测和分割模型提高分割掩膜边界质量的全新的数据增强技术。DeNISE利用两个连续深度学习架构之间的固有差异来提高预测分割边缘的精度。DeNISE适用于各种类型的神经网络,不需要整个网络进行训练,因此能够迅速实验以确定哪些模型互相补充。我们测试和应用了DeNISE在航空图像中进行分割。航空图像因具有低分辨率和光学噪声(如反射、阴影和视觉障碍)而著称,难以处理。总体而言,本文展示了DeNISE的潜力。利用该技术,我们取得了78.9%的建移IoU提高。
https://arxiv.org/abs/2309.02091
We introduce a novel approach for image edge detection based on pseudo-Boolean polynomials for image patches. We show that patches covering edge regions in the image result in pseudo-Boolean polynomials with higher degrees compared to patches that cover blob regions. The proposed approach is based on reduction of polynomial degree and equivalence properties of penalty-based pseudo-Boolean polynomials.
我们介绍了一种基于伪布尔多项式的图像边缘检测新方法,该方法适用于图像点片。我们证明了在图像中覆盖边缘区域的点片会导致更高的伪布尔多项式度数,而覆盖blob区域的点片则会导致度数较低的伪布尔多项式。该提议的方法基于多项式度数的减少和基于惩罚的伪布尔多项式等价性特性。
https://arxiv.org/abs/2308.15557
We introduce a deterministic approach to edge detection and image segmentation by formulating pseudo-Boolean polynomials on image patches. The approach works by applying a binary classification of blob and edge regions in an image based on the degrees of pseudo-Boolean polynomials calculated on patches extracted from the provided image. We test our method on simple images containing primitive shapes of constant and contrasting colour and establish the feasibility before applying it to complex instances like aerial landscape images. The proposed method is based on the exploitation of the reduction, polynomial degree, and equivalence properties of penalty-based pseudo-Boolean polynomials.
我们介绍了一种确定性的方法,用于边缘检测和图像分割,方法是通过在图像点云上定义伪布尔多项式来实现的。该方法通过在给定图像点的点云上计算伪布尔多项式的度数,以确定图像中blob和边缘区域的二进制分类。在测试简单图像中,包含恒定和对比颜色的基本形状,并确定其可行性之前,我们将其应用于类似航空景观图像的复杂实例。该提议的方法基于利用惩罚基的伪布尔多项式的减少、polynomial degree和等价性质。
https://arxiv.org/abs/2308.15453
Edge detection, as a core component in a wide range of visionoriented tasks, is to identify object boundaries and prominent edges in natural images. An edge detector is desired to be both efficient and accurate for practical use. To achieve the goal, two key issues should be concerned: 1) How to liberate deep edge models from inefficient pre-trained backbones that are leveraged by most existing deep learning methods, for saving the computational cost and cutting the model size; and 2) How to mitigate the negative influence from noisy or even wrong labels in training data, which widely exist in edge detection due to the subjectivity and ambiguity of annotators, for the robustness and accuracy. In this paper, we attempt to simultaneously address the above problems via developing a collaborative learning based model, termed PEdger. The principle behind our PEdger is that, the information learned from different training moments and heterogeneous (recurrent and non recurrent in this work) architectures, can be assembled to explore robust knowledge against noisy annotations, even without the help of pre-training on extra data. Extensive ablation studies together with quantitative and qualitative experimental comparisons on the BSDS500 and NYUD datasets are conducted to verify the effectiveness of our design, and demonstrate its superiority over other competitors in terms of accuracy, speed, and model size. Codes can be found at this https URL.
边缘检测作为视觉任务中的核心组件,旨在在自然图像中识别物体边界和突出的边缘。为了实现目标,需要关注两个关键问题:1) 如何从不同的训练时刻和异构(在本文中为持续和非持续)架构中解放深度边缘模型,以减少计算成本和减小模型大小,以节省计算资源和提高模型鲁棒性;2) 如何减轻训练数据中的噪声或甚至错误标签的负面影响,由于标注员的主观性和不确定性,这种现象在边缘检测中非常普遍,以增强性和准确性为目的。在本文中,我们试图通过开发一种协同学习基于模型的方法,称为PEdger,同时解决上述问题。我们的PEdger原则是,从不同的训练时刻和异构(在本文中为持续和非持续)架构中学习的信息可以组装起来,探索对抗噪声标注的稳健知识,即使没有额外的训练数据的帮助。进行了广泛的削减研究和对BSDS500和NYU-D数据集的定量和定性实验比较,以验证我们的设计的有效性,并证明在准确性、速度和模型大小方面优于其他竞争对手。代码可在本URL找到。
https://arxiv.org/abs/2308.14084
This paper proposes a novel zero-shot edge detection with SCESAME, which stands for Spectral Clustering-based Ensemble for Segment Anything Model Estimation, based on the recently proposed Segment Anything Model (SAM). SAM is a foundation model for segmentation tasks, and one of the interesting applications of SAM is Automatic Mask Generation (AMG), which generates zero-shot segmentation masks of an entire image. AMG can be applied to edge detection, but suffers from the problem of overdetecting edges. Edge detection with SCESAME overcomes this problem by three steps: (1) eliminating small generated masks, (2) combining masks by spectral clustering, taking into account mask positions and overlaps, and (3) removing artifacts after edge detection. We performed edge detection experiments on two datasets, BSDS500 and NYUDv2. Although our zero-shot approach is simple, the experimental results on BSDS500 showed almost identical performance to human performance and CNN-based methods from seven years ago. In the NYUDv2 experiments, it performed almost as well as recent CNN-based methods. These results indicate that our method has the potential to be a strong baseline for future zero-shot edge detection methods. Furthermore, SCESAME is not only applicable to edge detection, but also to other downstream zero-shot tasks.
这篇文章提出了一种新的零样本边缘检测方法,名为SCESAME,其含义是基于最近提出的Segment Anything Model(SAM)的Spectral Clustering-based Ensemble,用于分割任意模型估计。SAM是一个用于分割任务的基元模型,其中SAM的一个有趣的应用是自动分割掩码生成(AMG),该方法生成整个图像的零样本分割掩码。AMG可以应用于边缘检测,但存在过度检测边缘的问题。通过三个步骤克服了这个问题:(1)消除生成的小掩码;(2)通过Spectral Clustering结合掩码,考虑掩码的位置和重叠;(3)在边缘检测后删除痕迹。我们在两个数据集上进行了边缘检测实验,分别是BSDS500和NYUDv2。尽管我们的零样本方法很简单,在BSDS500上的实验结果与七年前人类表现和CNN方法几乎相同。在NYUDv2上,它几乎与最近的CNN方法表现相同。这些结果表明,我们的方法有成为未来零样本边缘检测方法的强大基准的潜力。此外,SCESAME不仅可以应用于边缘检测,还可以应用于其他零样本后续任务。
https://arxiv.org/abs/2308.13779
The reconstruction of textureless areas has long been a challenging problem in MVS due to lack of reliable pixel correspondences between images. In this paper, we propose the Textureless-aware Segmentation And Correlative Refinement guided Multi-View Stereo (TSAR-MVS), a novel method that effectively tackles challenges posed by textureless areas in 3D reconstruction through filtering, refinement and segmentation. First, we implement joint hypothesis filtering, a technique that merges a confidence estimator with a disparity discontinuity detector to eliminate incorrect depth estimations. Second, to spread the pixels with confident depth, we introduce a iterative correlation refinement strategy that leverages RANSAC to generate superpixels, succeeded by a median filter for broadening the influence of accurately determined pixels.Finally, we present a textureless-aware segmentation method that leverages edge detection and line detection for accurately identify large textureless regions to be fitted using 3D planes. Experiments on extensive datasets demonstrate that our method significantly outperforms most non-learning methods and exhibits robustness to textureless areas while preserving fine details.
由于没有可靠的图像像素对应关系,MVS中的无纹理区域重建一直是一个挑战性的问题。在本文中,我们提出了一种无纹理aware Segmentation And Correlative refinement guided Multi-View Stereo(TSAR-MVS),一种新方法,通过滤波、精化以及分割,有效地解决了无纹理区域在3D重建中面临的挑战。首先,我们实现了联合假设滤波,该技术将信心估计与差异连续性检测相结合,以消除不正确的深度估计。其次,为了均匀分布具有信心深度的像素,我们引入了一种迭代相关 refinement策略,利用RANSAC生成超级像素,然后使用中值滤波以扩展准确确定的像素的影响。最后,我们提出了一种无纹理aware segmentation方法,利用边缘检测和线检测准确识别使用3D平面 fitting 大的无纹理区域。对大量数据集的实验表明,我们的方法 significantly outperforms 大多数非学习方法,并表现出对无纹理区域的稳健性,同时保留 fine details。
https://arxiv.org/abs/2308.09990
Most high-level computer vision tasks rely on low-level image operations as their initial processes. Operations such as edge detection, image enhancement, and super-resolution, provide the foundations for higher level image analysis. In this work we address the edge detection considering three main objectives: simplicity, efficiency, and generalization since current state-of-the-art (SOTA) edge detection models are increased in complexity for better accuracy. To achieve this, we present Tiny and Efficient Edge Detector (TEED), a light convolutional neural network with only $58K$ parameters, less than $0.2$% of the state-of-the-art models. Training on the BIPED dataset takes $less than 30 minutes$, with each epoch requiring $less than 5 minutes$. Our proposed model is easy to train and it quickly converges within very first few epochs, while the predicted edge-maps are crisp and of high quality. Additionally, we propose a new dataset to test the generalization of edge detection, which comprises samples from popular images used in edge detection and image segmentation. The source code is available in this https URL.
大多数高级计算机视觉任务都依赖于低层次的图像操作作为其初始过程。例如,边缘检测、图像增强和超分辨率操作为更高层次的图像分析提供了基础。在这项工作中,我们考虑了边缘检测的三个主要目标:简单性、效率和泛化性。为了实现这一点,我们提出了Tiny and Efficient Edge Detector (TEED),这是一个轻量级卷积神经网络,仅有58K参数,不到SOTA模型的0.2%。在BIPED数据集上训练只需要不到30分钟,每个 epoch只需要不到5分钟。我们提出的模型易于训练,在最初的几个epoch内很快就收敛了,而预测的边缘地图清晰且质量高。此外,我们提出了一个新的数据集,用于测试边缘检测的泛化性,该数据集包括用于边缘检测和图像分割的常见图像样本。源代码在此httpsURL上可用。
https://arxiv.org/abs/2308.06468
Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability. However, it is hard and tedious for DBAs to manage a large number of database instances (e.g., millions of instances on the cloud databases). Recently large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Thus, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources, and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results that D-Bot can efficiently and effectively diagnose the root causes and our code is available at this http URL.
数据库管理员(DBAs)在管理、维护和优化数据库系统方面发挥着关键作用,以确保数据可用性、性能和可靠性。然而,对于 DBAs来说,管理大量数据库实例(例如,云计算数据库中的数百万实例)是非常困难和繁琐的。最近,大型语言模型(LLMs)表现出了理解宝贵文档并生成合理答案的巨大潜力。因此,我们提出了 D-Bot,一个基于LLM的数据库管理员,可以从文本来源中不断获取数据库维护经验,并为目标数据库提供合理、 founded 的即时诊断和优化建议。本文提出了一个革命性的LLM为中心的数据库维护框架,包括(i)从文档和工具中检测数据库维护知识,(ii)用于根本原因分析的思想树推理,以及(iii)多个LLM之间的协作诊断。我们的初步实验结果显示,D-Bot能够有效地、有效地诊断根本原因,我们的代码现在可以在 this http URL 上可用。
https://arxiv.org/abs/2308.05481
In this study, we tackle the challenging fine-grained edge detection task, which refers to predicting specific edges caused by reflectance, illumination, normal, and depth changes, respectively. Prior methods exploit multi-scale convolutional networks, which are limited in three aspects: (1) Convolutions are local operators while identifying the cause of edge formation requires looking at far away pixels. (2) Priors specific to edge cause are fixed in prediction heads. (3) Using separate networks for generic and fine-grained edge detection, and the constraint between them may be violated. To address these three issues, we propose a two-stage transformer-based network sequentially predicting generic edges and fine-grained edges, which has a global receptive field thanks to the attention mechanism. The prior knowledge of edge causes is formulated as four learnable cause tokens in a cause-aware decoder design. Furthermore, to encourage the consistency between generic edges and fine-grained edges, an edge aggregation and alignment loss is exploited. We evaluate our method on the public benchmark BSDS-RIND and several newly derived benchmarks, and achieve new state-of-the-art results. Our code, data, and models are publicly available at this https URL.
在本研究中,我们解决了细致的边缘检测挑战任务,该任务是指预测因反光、照明、正常和深度变化而产生的特定边缘。以前的研究方法利用多尺度卷积神经网络,但它们在三个方面受到限制:(1)卷积是局部操作,而确定边缘形成的原因需要Looking at far away pixels。(2)对于边缘原因特定的先验值在预测头中是固定的。(3)使用 separate networks 分别进行一般边缘和细致边缘的检测,它们之间的约束可能会被违反。为了解决这三个问题,我们提出了一个两阶段Transformer-based网络,Sequentially predict 一般边缘和细致边缘,由于其注意力机制,它具有全局响应面。由于边缘原因的先验知识被表示为四个可学习的原因元,在原因 aware 解码设计中。此外,为了鼓励一般边缘和细致边缘之间的一致性,一种边缘聚合和对齐损失被利用。我们评估了我们的方法和在公开基准BSDS-RIND和几个新生成的基准上的性能,并取得了最新的先进技术结果。我们的代码、数据和模型在这个https URL上公开可用。
https://arxiv.org/abs/2308.03092
The main approaches for simulating FMCW radar are based on ray tracing, which is usually computationally intensive and do not account for background noise. This work proposes a faster method for FMCW radar simulation capable of generating synthetic raw radar data using generative adversarial networks (GAN). The code and pre-trained weights are open-source and available on GitHub. This method generates 16 simultaneous chirps, which allows the generated data to be used for the further development of algorithms for processing radar data (filtering and clustering). This can increase the potential for data augmentation, e.g., by generating data in non-existent or safety-critical scenarios that are not reproducible in real life. In this work, the GAN was trained with radar measurements of a motorcycle and used to generate synthetic raw radar data of a motorcycle traveling in a straight line. For generating this data, the distance of the motorcycle and Gaussian noise are used as input to the neural network. The synthetic generated radar chirps were evaluated using the Frechet Inception Distance (FID). Then, the Range-Azimuth (RA) map is calculated twice: (1\textsuperscript{st}) based on synthetic data using this GAN and (2\textsuperscript{nd}) based on real data. Based on these RA maps, an algorithm with adaptive threshold and edge detection is used for object detection. The results have shown that the data is realistic in terms of coherent radar reflections of the motorcycle and background noise based on the comparison of chirps, the RA maps and the object detection results. Thus, the proposed method in this work has shown to minimize the simulation-to-reality gap for the generation of radar data.
模拟FMCW雷达的主要方法是基于ray tracing,这种方法通常计算量很大,并不考虑背景噪声。这项工作提出了一种更快的方法,可以使用生成对抗网络(GAN)生成合成的原始雷达数据,以用于雷达数据处理算法的进一步发展(筛选和聚类)。代码和预先训练的权重是开源的,可以在GitHub上可用。这种方法生成16个同时的 chirp,这使得生成的数据可用于进一步开发雷达数据处理算法(筛选和聚类)。这可以增加数据增强的潜力,例如,可以生成不存在或安全关键的场景中无法在真实生活中重现的数据。在这项工作中,GAN 被训练以摩托车的雷达测量数据,用于生成合成的原始雷达数据,以用于摩托车以直线行驶生成合成雷达数据。为了生成这些数据,摩托车和Gaussian 噪声的距离被用作神经网络的输入。合成生成的雷达 chirps 使用 Frechetception Distance(FID)进行评估。然后, Range-Azimuth(RA)地图两次计算:(1\textsuperscript{st}) 基于使用 this GAN 生成的合成数据,(2\textsuperscript{nd}) 基于真实数据。基于这些 RA 地图,具有自适应阈值和边缘检测算法的算法用于物体检测。结果表明,数据在摩托车和背景噪声的协同雷达反射之间的一致性方面具有现实性,基于 chirps 、RA 地图和物体检测结果的比较。因此,该工作提出的方法表明,可以最小化模拟到现实的差距,用于生成雷达数据。
https://arxiv.org/abs/2308.02632
Estimating surface normals from 3D point clouds is critical for various applications, including surface reconstruction and rendering. While existing methods for normal estimation perform well in regions where normals change slowly, they tend to fail where normals vary rapidly. To address this issue, we propose a novel approach called MSECNet, which improves estimation in normal varying regions by treating normal variation modeling as an edge detection problem. MSECNet consists of a backbone network and a multi-scale edge conditioning (MSEC) stream. The MSEC stream achieves robust edge detection through multi-scale feature fusion and adaptive edge detection. The detected edges are then combined with the output of the backbone network using the edge conditioning module to produce edge-aware representations. Extensive experiments show that MSECNet outperforms existing methods on both synthetic (PCPNet) and real-world (SceneNN) datasets while running significantly faster. We also conduct various analyses to investigate the contribution of each component in the MSEC stream. Finally, we demonstrate the effectiveness of our approach in surface reconstruction.
估计表面法向量对于各种应用,包括表面重建和渲染至关重要。虽然现有的法向量估计方法在缓慢变化的区域表现良好,但在快速变化的区域往往会导致失败。为了解决这一问题,我们提出了一种 novel 的方法称为 MSECNet,该方法通过将法向量变化建模视为边缘检测问题来改进在法向量变化区域的估计。MSECNet 由主链网络和多尺度边缘适应(MSEC)流组成。MSEC 流通过多尺度特征融合和自适应边缘检测实现 robust 的边缘检测。检测到的边缘然后用主链网络的输出加上边缘适应模块来产生边缘意识表示。广泛的实验表明,MSECNet 在合成(PCPNet)和实际(SceneNN)数据集上比现有方法表现更好,同时运行速度更快。我们还进行了各种分析,以研究 MSEC 流中每个组件的贡献。最后,我们证明了我们在表面重建中方法的有效性。
https://arxiv.org/abs/2308.02237
Multispectral imagery is frequently incorporated into agricultural tasks, providing valuable support for applications such as image segmentation, crop monitoring, field robotics, and yield estimation. From an image segmentation perspective, multispectral cameras can provide rich spectral information, helping with noise reduction and feature extraction. As such, this paper concentrates on the use of fusion approaches to enhance the segmentation process in agricultural applications. More specifically, in this work, we compare different fusion approaches by combining RGB and NDVI as inputs for crop row detection, which can be useful in autonomous robots operating in the field. The inputs are used individually as well as combined at different times of the process (early and late fusion) to perform classical and DL-based semantic segmentation. In this study, two agriculture-related datasets are subjected to analysis using both deep learning (DL)-based and classical segmentation methodologies. The experiments reveal that classical segmentation methods, utilizing techniques such as edge detection and thresholding, can effectively compete with DL-based algorithms, particularly in tasks requiring precise foreground-background separation. This suggests that traditional methods retain their efficacy in certain specialized applications within the agricultural domain. Moreover, among the fusion strategies examined, late fusion emerges as the most robust approach, demonstrating superiority in adaptability and effectiveness across varying segmentation scenarios. The dataset and code is available at this https URL.
彩色图像常常融入农业生产任务中,为图像分割、作物监测、田间机器人和产量估计等应用提供了宝贵的支持。从图像分割的角度来看,彩色相机可以提供丰富的光谱信息,帮助减少噪声并提取特征。因此,本文重点探讨了融合方法在农业应用中的使用,具体而言,本研究将使用RGB和NDVI作为 crop row检测的输入,这在田间自主机器人操作中非常有用。输入分别使用单个或多个(早期和晚期融合)在过程中的不同时间进行组合,以进行经典和深度学习语义分割。在本研究中,两个与农业相关的数据集采用深度学习(DL)和经典分割方法进行了分析。实验表明,经典分割方法利用边缘检测和阈值等技术,可以 effectively与深度学习算法竞争,特别是在需要精确前景背景分离的任务中。这表明传统方法在农业领域的某些特定 specialized 应用中仍然具有有效性。此外,在研究中,晚期融合 emerged 成为最稳定的融合方法,在不同分割场景下的适应力和效果方面都表现出优越性。数据集和代码可在本网站上 https 可用。
https://arxiv.org/abs/2308.00159
Quantum computers possess the potential to process data using a remarkably reduced number of qubits compared to conventional bits, as per theoretical foundations. However, recent experiments have indicated that the practical feasibility of retrieving an image from its quantum encoded version is currently limited to very small image sizes. Despite this constraint, variational quantum machine learning algorithms can still be employed in the current noisy intermediate scale quantum (NISQ) era. An example is a hybrid quantum machine learning approach for edge detection. In our study, we present an application of quantum transfer learning for detecting cracks in gray value images. We compare the performance and training time of PennyLane's standard qubits with IBM's qasm\_simulator and real backends, offering insights into their execution efficiency.
量子计算机利用与传统二进制位相比显著减少的qubit来处理数据,根据理论基点。然而,最近的实验表明,从量子编码版本中恢复图像的实际可行性目前只局限于非常小的图像大小。尽管存在这种限制,Variational quantum machine learning算法仍然可以在当前的噪声中等规模的量子(NISQ)时代使用。一个示例是用于边缘检测的混合量子机器学习方法。在我们的研究中,我们介绍了量子转移学习的应用,用于检测灰度图像中的裂缝。我们比较了PennyLane的标准qubit的性能和训练时间,与IBM的qasm\_simulator和实际后端,提供了它们执行效率的洞察。
https://arxiv.org/abs/2307.16723
Relying on large-scale training data with pixel-level labels, previous edge detection methods have achieved high performance. However, it is hard to manually label edges accurately, especially for large datasets, and thus the datasets inevitably contain noisy labels. This label-noise issue has been studied extensively for classification, while still remaining under-explored for edge detection. To address the label-noise issue for edge detection, this paper proposes to learn Pixel-level NoiseTransitions to model the label-corruption process. To achieve it, we develop a novel Pixel-wise Shift Learning (PSL) module to estimate the transition from clean to noisy labels as a displacement field. Exploiting the estimated noise transitions, our model, named PNT-Edge, is able to fit the prediction to clean labels. In addition, a local edge density regularization term is devised to exploit local structure information for better transition learning. This term encourages learning large shifts for the edges with complex local structures. Experiments on SBD and Cityscapes demonstrate the effectiveness of our method in relieving the impact of label noise. Codes will be available at github.
依靠大规模的带像素标签的训练数据,以前的边缘检测方法取得了高性能。然而,手动标注边缘十分困难,特别是对于大型数据集,因此数据集不可避免地含有噪声标签。该标签噪声问题在分类中被深入研究,但边缘检测方面仍未被充分探索。为了解决边缘检测方面的标签噪声问题,本文提出学习像素级别的噪声transitions来建模标签 corruption 过程。为了实现这一目标,我们开发了一个新的像素级移动学习(PSL)模块,以估计从干净到噪声标签的迁移过程,并将其作为转移矩阵。利用估计的噪声迁移,我们构建了一个名为 PNT-Edge 的模型,它能够将预测匹配到干净标签上。此外,我们设计了一个 local edge density Regularization term,以利用 local 结构信息更好地进行迁移学习。这一 term 鼓励学习复杂的 local 结构下的大规模迁移。在 SBD 和城市景观数据集上的实验表明,我们的方法和减轻标签噪声影响的有效性。代码将在 GitHub 上提供。
https://arxiv.org/abs/2307.14070
The significance of multi-scale features has been gradually recognized by the edge detection community. However, the fusion of multi-scale features increases the complexity of the model, which is not friendly to practical application. In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model. CTFN includes two lightweight multi-scale feature fusion modules: a Semantic Enhancement Module (SEM) that can utilize the semantic information contained in coarse-scale features to guide the learning of fine-scale features, and a Pseudo Pixel-level Weighting (PPW) module that aggregate the complementary merits of multi-scale features by assigning weights to all features. Notwithstanding all this, the interference of texture noise makes the correct classification of some pixels still a challenge. For these hard samples, we propose a novel loss function, coined Dynamic Focal Loss, which reshapes the standard cross-entropy loss and dynamically adjusts the weights to correct the distribution of hard samples. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and BIPEDv2. Compared with state-of-the-art methods, CTFN achieves competitive accuracy with less parameters and computational cost. Apart from the backbone, CTFN requires only 0.1M additional parameters, which reduces its computation cost to just 60% of other state-of-the-art methods. The codes are available at this https URL.
多尺度特征的重要性逐渐得到了边缘检测社区的认可。然而,多尺度特征的集成增加了模型的复杂性,这对实际应用并不友好。在这个研究中,我们提出了一种紧凑的两次融合网络(CTFN),以 fully integrate multi-scale features 的同时保持模型的紧凑性。 CTFN 包括两个轻量级多尺度特征融合模块:一个语义增强模块(SEM),可以利用粗尺度特征中的语义信息指导细尺度特征的学习,还有一个伪像素水平权重模块(PPW),可以汇总多尺度特征的互补优点,并动态调整权重以纠正硬样本分布。我们对三个数据集(BSDS500、NYUDv2和BIPEDv2)进行了评估,与最先进的方法相比,CTFN 在参数和计算成本上都表现出了竞争力的精度。除了核心模型,CTFN只需要 0.1 百万额外的参数,将计算成本降低到最先进的方法的 60% 以下。代码在此 https URL 可用。
https://arxiv.org/abs/2307.04952
Existing edge-aware camouflaged object detection (COD) methods normally output the edge prediction in the early stage. However, edges are important and fundamental factors in the following segmentation task. Due to the high visual similarity between camouflaged targets and the surroundings, edge prior predicted in early stage usually introduces erroneous foreground-background and contaminates features for segmentation. To tackle this problem, we propose a novel Edge-aware Mirror Network (EAMNet), which models edge detection and camouflaged object segmentation as a cross refinement process. More specifically, EAMNet has a two-branch architecture, where a segmentation-induced edge aggregation module and an edge-induced integrity aggregation module are designed to cross-guide the segmentation branch and edge detection branch. A guided-residual channel attention module which leverages the residual connection and gated convolution finally better extracts structural details from low-level features. Quantitative and qualitative experiment results show that EAMNet outperforms existing cutting-edge baselines on three widely used COD datasets. Codes are available at this https URL.
现有的边缘aware掩模对象检测方法(COD)通常在早期阶段输出边缘预测。然而,边缘在后续分割任务中是重要且基本的因素。由于掩模目标与周围环境具有很高的视觉相似性,早期边缘预测通常会引入错误的主要背景和污染分割特征。为了解决这一问题,我们提出了一种新的边缘aware镜像网络(EAMNet),该网络将边缘检测和掩模对象分割视为交叉 refinement 过程。更具体地说,EAMNet具有两个分支架构,其中一种是由分割引起的边缘聚合模块和一种是由边缘引起的完整性聚合模块,旨在交叉指导分割分支和边缘检测分支。一种指导残余通道注意力模块,利用残余连接和门卷积最后更好地从低层次的特征中提取结构细节。定量和定性实验结果表明,EAMNet在三个广泛使用的 COD 数据集上优于现有的尖端基准。代码可在 this https URL 中找到。
https://arxiv.org/abs/2307.03932
Choosing how to encode a real-world problem as a machine learning task is an important design decision in machine learning. The task of glacier calving front modeling has often been approached as a semantic segmentation task. Recent studies have shown that combining segmentation with edge detection can improve the accuracy of calving front detectors. Building on this observation, we completely rephrase the task as a contour tracing problem and propose a model for explicit contour detection that does not incorporate any dense predictions as intermediate steps. The proposed approach, called ``Charting Outlines by Recurrent Adaptation'' (COBRA), combines Convolutional Neural Networks (CNNs) for feature extraction and active contour models for the delineation. By training and evaluating on several large-scale datasets of Greenland's outlet glaciers, we show that this approach indeed outperforms the aforementioned methods based on segmentation and edge-detection. Finally, we demonstrate that explicit contour detection has benefits over pixel-wise methods when quantifying the models' prediction uncertainties. The project page containing the code and animated model predictions can be found at \url{this https URL}.
选择如何将现实世界的问题编码为机器学习任务是机器学习中一个重要的设计决策。冰川堕胎口建模任务通常被视为语义分割任务。最近的研究表明,结合分割和边缘检测可以提高堕胎口检测的准确性。基于这一观察,我们将任务完全重新表述为轮廓追踪问题,并提出了一种 explicit 轮廓检测模型,该模型不包含任何密集预测作为中间步骤。我们提出的这种方法被称为“循环适应大纲编制”(COBRA),它结合了卷积神经网络(CNNs)进行特征提取和主动轮廓模型进行轮廓绘制。通过在 Greenland 的输出冰川多个大规模数据集上训练和评估,我们表明,这种方法确实比基于分割和边缘检测的方法表现更好。最后,我们证明了 explicit 轮廓检测在量化模型预测不确定性方面比像素级方法更有优势。包含代码和动画模型预测的项目页面可以在 url{this https URL} 找到。
https://arxiv.org/abs/2307.03461
Learning-based edge detection usually suffers from predicting thick edges. Through extensive quantitative study with a new edge crispness measure, we find that noisy human-labeled edges are the main cause of thick predictions. Based on this observation, we advocate that more attention should be paid on label quality than on model design to achieve crisp edge detection. To this end, we propose an effective Canny-guided refinement of human-labeled edges whose result can be used to train crisp edge detectors. Essentially, it seeks for a subset of over-detected Canny edges that best align human labels. We show that several existing edge detectors can be turned into a crisp edge detector through training on our refined edge maps. Experiments demonstrate that deep models trained with refined edges achieve significant performance boost of crispness from 17.4% to 30.6%. With the PiDiNet backbone, our method improves ODS and OIS by 12.2% and 12.6% on the Multicue dataset, respectively, without relying on non-maximal suppression. We further conduct experiments and show the superiority of our crisp edge detection for optical flow estimation and image segmentation.
基于学习的边缘检测通常容易出现预测厚实的边缘。通过进行全面定量研究并使用新的边缘 crispness 测量方法,我们发现人类标注的边缘噪声是导致厚实预测的主要原因。基于这一观察,我们提倡更加注重标签质量而不是模型设计,以实现清晰的边缘检测。为此,我们提出了一种有效的 Canny 指导的人类标注边缘的精加工方法,其结果可用于训练清晰边缘检测器。本质上,它寻求 over- detected Canny 边缘的特定子集,这些边缘最好与人类标签对齐。我们展示了通过训练我们的精加工边缘映射,可以将多个现有的边缘检测器转化为清晰边缘检测器。实验表明,使用精加工的边缘映射训练的深度模型,在多指示数据集上从17.4%提高到了30.6%的清晰性能Boost。基于 PiDiNet 骨干网络,我们的方法在 Multicue 数据集上分别提高了ODS和OIS12.2%和12.6%。此外,我们进一步进行了实验,并展示了我们清晰边缘检测对于光学流估计和图像分割的优越性。
https://arxiv.org/abs/2306.15172
The field of 'explainable' artificial intelligence (XAI) has produced highly cited methods that seek to make the decisions of complex machine learning (ML) methods 'understandable' to humans, for example by attributing 'importance' scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.
解释型人工智能(XAI)领域产生了大量引用广泛的方法,旨在让复杂的机器学习(ML)方法对人类变得易于理解,例如将输入特征的重要性得分赋予输入特征。然而,缺乏正式支持使得从给定XAI方法的结果中得出安全结论变得不清楚,同时也妨碍了XAI方法的理论验证和实证验证。这意味着挑战性的非线性问题,通常由深度学习网络解决,目前缺乏适当的补救措施。在这里,我们创造了三个不同的非线性分类基准数据集,其中设计已知的重要类条件特征作为实际正确解释。使用新的定量指标,我们基准了多种XAI方法在不同深度学习模型架构中的解释性能。我们表明,流行的XAI方法往往无法显著地优于随机性能基准和边缘检测方法。此外,我们证明,从不同模型架构得出的解释可以非常不同;因此,即使在控制条件下,也可能被误解。
https://arxiv.org/abs/2306.12816