This paper proposes a novel zero-shot edge detection with SCESAME, which stands for Spectral Clustering-based Ensemble for Segment Anything Model Estimation, based on the recently proposed Segment Anything Model (SAM). SAM is a foundation model for segmentation tasks, and one of the interesting applications of SAM is Automatic Mask Generation (AMG), which generates zero-shot segmentation masks of an entire image. AMG can be applied to edge detection, but suffers from the problem of overdetecting edges. Edge detection with SCESAME overcomes this problem by three steps: (1) eliminating small generated masks, (2) combining masks by spectral clustering, taking into account mask positions and overlaps, and (3) removing artifacts after edge detection. We performed edge detection experiments on two datasets, BSDS500 and NYUDv2. Although our zero-shot approach is simple, the experimental results on BSDS500 showed almost identical performance to human performance and CNN-based methods from seven years ago. In the NYUDv2 experiments, it performed almost as well as recent CNN-based methods. These results indicate that our method has the potential to be a strong baseline for future zero-shot edge detection methods. Furthermore, SCESAME is not only applicable to edge detection, but also to other downstream zero-shot tasks.
这篇文章提出了一种新的零样本边缘检测方法,名为SCESAME,其含义是基于最近提出的Segment Anything Model(SAM)的Spectral Clustering-based Ensemble,用于分割任意模型估计。SAM是一个用于分割任务的基元模型,其中SAM的一个有趣的应用是自动分割掩码生成(AMG),该方法生成整个图像的零样本分割掩码。AMG可以应用于边缘检测,但存在过度检测边缘的问题。通过三个步骤克服了这个问题:(1)消除生成的小掩码;(2)通过Spectral Clustering结合掩码,考虑掩码的位置和重叠;(3)在边缘检测后删除痕迹。我们在两个数据集上进行了边缘检测实验,分别是BSDS500和NYUDv2。尽管我们的零样本方法很简单,在BSDS500上的实验结果与七年前人类表现和CNN方法几乎相同。在NYUDv2上,它几乎与最近的CNN方法表现相同。这些结果表明,我们的方法有成为未来零样本边缘检测方法的强大基准的潜力。此外,SCESAME不仅可以应用于边缘检测,还可以应用于其他零样本后续任务。
https://arxiv.org/abs/2308.13779
The reconstruction of textureless areas has long been a challenging problem in MVS due to lack of reliable pixel correspondences between images. In this paper, we propose the Textureless-aware Segmentation And Correlative Refinement guided Multi-View Stereo (TSAR-MVS), a novel method that effectively tackles challenges posed by textureless areas in 3D reconstruction through filtering, refinement and segmentation. First, we implement joint hypothesis filtering, a technique that merges a confidence estimator with a disparity discontinuity detector to eliminate incorrect depth estimations. Second, to spread the pixels with confident depth, we introduce a iterative correlation refinement strategy that leverages RANSAC to generate superpixels, succeeded by a median filter for broadening the influence of accurately determined pixels.Finally, we present a textureless-aware segmentation method that leverages edge detection and line detection for accurately identify large textureless regions to be fitted using 3D planes. Experiments on extensive datasets demonstrate that our method significantly outperforms most non-learning methods and exhibits robustness to textureless areas while preserving fine details.
由于没有可靠的图像像素对应关系,MVS中的无纹理区域重建一直是一个挑战性的问题。在本文中,我们提出了一种无纹理aware Segmentation And Correlative refinement guided Multi-View Stereo(TSAR-MVS),一种新方法,通过滤波、精化以及分割,有效地解决了无纹理区域在3D重建中面临的挑战。首先,我们实现了联合假设滤波,该技术将信心估计与差异连续性检测相结合,以消除不正确的深度估计。其次,为了均匀分布具有信心深度的像素,我们引入了一种迭代相关 refinement策略,利用RANSAC生成超级像素,然后使用中值滤波以扩展准确确定的像素的影响。最后,我们提出了一种无纹理aware segmentation方法,利用边缘检测和线检测准确识别使用3D平面 fitting 大的无纹理区域。对大量数据集的实验表明,我们的方法 significantly outperforms 大多数非学习方法,并表现出对无纹理区域的稳健性,同时保留 fine details。
https://arxiv.org/abs/2308.09990
Most high-level computer vision tasks rely on low-level image operations as their initial processes. Operations such as edge detection, image enhancement, and super-resolution, provide the foundations for higher level image analysis. In this work we address the edge detection considering three main objectives: simplicity, efficiency, and generalization since current state-of-the-art (SOTA) edge detection models are increased in complexity for better accuracy. To achieve this, we present Tiny and Efficient Edge Detector (TEED), a light convolutional neural network with only $58K$ parameters, less than $0.2$% of the state-of-the-art models. Training on the BIPED dataset takes $less than 30 minutes$, with each epoch requiring $less than 5 minutes$. Our proposed model is easy to train and it quickly converges within very first few epochs, while the predicted edge-maps are crisp and of high quality. Additionally, we propose a new dataset to test the generalization of edge detection, which comprises samples from popular images used in edge detection and image segmentation. The source code is available in this https URL.
大多数高级计算机视觉任务都依赖于低层次的图像操作作为其初始过程。例如,边缘检测、图像增强和超分辨率操作为更高层次的图像分析提供了基础。在这项工作中,我们考虑了边缘检测的三个主要目标:简单性、效率和泛化性。为了实现这一点,我们提出了Tiny and Efficient Edge Detector (TEED),这是一个轻量级卷积神经网络,仅有58K参数,不到SOTA模型的0.2%。在BIPED数据集上训练只需要不到30分钟,每个 epoch只需要不到5分钟。我们提出的模型易于训练,在最初的几个epoch内很快就收敛了,而预测的边缘地图清晰且质量高。此外,我们提出了一个新的数据集,用于测试边缘检测的泛化性,该数据集包括用于边缘检测和图像分割的常见图像样本。源代码在此httpsURL上可用。
https://arxiv.org/abs/2308.06468
Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability. However, it is hard and tedious for DBAs to manage a large number of database instances (e.g., millions of instances on the cloud databases). Recently large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Thus, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources, and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results that D-Bot can efficiently and effectively diagnose the root causes and our code is available at this http URL.
数据库管理员(DBAs)在管理、维护和优化数据库系统方面发挥着关键作用,以确保数据可用性、性能和可靠性。然而,对于 DBAs来说,管理大量数据库实例(例如,云计算数据库中的数百万实例)是非常困难和繁琐的。最近,大型语言模型(LLMs)表现出了理解宝贵文档并生成合理答案的巨大潜力。因此,我们提出了 D-Bot,一个基于LLM的数据库管理员,可以从文本来源中不断获取数据库维护经验,并为目标数据库提供合理、 founded 的即时诊断和优化建议。本文提出了一个革命性的LLM为中心的数据库维护框架,包括(i)从文档和工具中检测数据库维护知识,(ii)用于根本原因分析的思想树推理,以及(iii)多个LLM之间的协作诊断。我们的初步实验结果显示,D-Bot能够有效地、有效地诊断根本原因,我们的代码现在可以在 this http URL 上可用。
https://arxiv.org/abs/2308.05481
In this study, we tackle the challenging fine-grained edge detection task, which refers to predicting specific edges caused by reflectance, illumination, normal, and depth changes, respectively. Prior methods exploit multi-scale convolutional networks, which are limited in three aspects: (1) Convolutions are local operators while identifying the cause of edge formation requires looking at far away pixels. (2) Priors specific to edge cause are fixed in prediction heads. (3) Using separate networks for generic and fine-grained edge detection, and the constraint between them may be violated. To address these three issues, we propose a two-stage transformer-based network sequentially predicting generic edges and fine-grained edges, which has a global receptive field thanks to the attention mechanism. The prior knowledge of edge causes is formulated as four learnable cause tokens in a cause-aware decoder design. Furthermore, to encourage the consistency between generic edges and fine-grained edges, an edge aggregation and alignment loss is exploited. We evaluate our method on the public benchmark BSDS-RIND and several newly derived benchmarks, and achieve new state-of-the-art results. Our code, data, and models are publicly available at this https URL.
在本研究中,我们解决了细致的边缘检测挑战任务,该任务是指预测因反光、照明、正常和深度变化而产生的特定边缘。以前的研究方法利用多尺度卷积神经网络,但它们在三个方面受到限制:(1)卷积是局部操作,而确定边缘形成的原因需要Looking at far away pixels。(2)对于边缘原因特定的先验值在预测头中是固定的。(3)使用 separate networks 分别进行一般边缘和细致边缘的检测,它们之间的约束可能会被违反。为了解决这三个问题,我们提出了一个两阶段Transformer-based网络,Sequentially predict 一般边缘和细致边缘,由于其注意力机制,它具有全局响应面。由于边缘原因的先验知识被表示为四个可学习的原因元,在原因 aware 解码设计中。此外,为了鼓励一般边缘和细致边缘之间的一致性,一种边缘聚合和对齐损失被利用。我们评估了我们的方法和在公开基准BSDS-RIND和几个新生成的基准上的性能,并取得了最新的先进技术结果。我们的代码、数据和模型在这个https URL上公开可用。
https://arxiv.org/abs/2308.03092
The main approaches for simulating FMCW radar are based on ray tracing, which is usually computationally intensive and do not account for background noise. This work proposes a faster method for FMCW radar simulation capable of generating synthetic raw radar data using generative adversarial networks (GAN). The code and pre-trained weights are open-source and available on GitHub. This method generates 16 simultaneous chirps, which allows the generated data to be used for the further development of algorithms for processing radar data (filtering and clustering). This can increase the potential for data augmentation, e.g., by generating data in non-existent or safety-critical scenarios that are not reproducible in real life. In this work, the GAN was trained with radar measurements of a motorcycle and used to generate synthetic raw radar data of a motorcycle traveling in a straight line. For generating this data, the distance of the motorcycle and Gaussian noise are used as input to the neural network. The synthetic generated radar chirps were evaluated using the Frechet Inception Distance (FID). Then, the Range-Azimuth (RA) map is calculated twice: (1\textsuperscript{st}) based on synthetic data using this GAN and (2\textsuperscript{nd}) based on real data. Based on these RA maps, an algorithm with adaptive threshold and edge detection is used for object detection. The results have shown that the data is realistic in terms of coherent radar reflections of the motorcycle and background noise based on the comparison of chirps, the RA maps and the object detection results. Thus, the proposed method in this work has shown to minimize the simulation-to-reality gap for the generation of radar data.
模拟FMCW雷达的主要方法是基于ray tracing,这种方法通常计算量很大,并不考虑背景噪声。这项工作提出了一种更快的方法,可以使用生成对抗网络(GAN)生成合成的原始雷达数据,以用于雷达数据处理算法的进一步发展(筛选和聚类)。代码和预先训练的权重是开源的,可以在GitHub上可用。这种方法生成16个同时的 chirp,这使得生成的数据可用于进一步开发雷达数据处理算法(筛选和聚类)。这可以增加数据增强的潜力,例如,可以生成不存在或安全关键的场景中无法在真实生活中重现的数据。在这项工作中,GAN 被训练以摩托车的雷达测量数据,用于生成合成的原始雷达数据,以用于摩托车以直线行驶生成合成雷达数据。为了生成这些数据,摩托车和Gaussian 噪声的距离被用作神经网络的输入。合成生成的雷达 chirps 使用 Frechetception Distance(FID)进行评估。然后, Range-Azimuth(RA)地图两次计算:(1\textsuperscript{st}) 基于使用 this GAN 生成的合成数据,(2\textsuperscript{nd}) 基于真实数据。基于这些 RA 地图,具有自适应阈值和边缘检测算法的算法用于物体检测。结果表明,数据在摩托车和背景噪声的协同雷达反射之间的一致性方面具有现实性,基于 chirps 、RA 地图和物体检测结果的比较。因此,该工作提出的方法表明,可以最小化模拟到现实的差距,用于生成雷达数据。
https://arxiv.org/abs/2308.02632
Estimating surface normals from 3D point clouds is critical for various applications, including surface reconstruction and rendering. While existing methods for normal estimation perform well in regions where normals change slowly, they tend to fail where normals vary rapidly. To address this issue, we propose a novel approach called MSECNet, which improves estimation in normal varying regions by treating normal variation modeling as an edge detection problem. MSECNet consists of a backbone network and a multi-scale edge conditioning (MSEC) stream. The MSEC stream achieves robust edge detection through multi-scale feature fusion and adaptive edge detection. The detected edges are then combined with the output of the backbone network using the edge conditioning module to produce edge-aware representations. Extensive experiments show that MSECNet outperforms existing methods on both synthetic (PCPNet) and real-world (SceneNN) datasets while running significantly faster. We also conduct various analyses to investigate the contribution of each component in the MSEC stream. Finally, we demonstrate the effectiveness of our approach in surface reconstruction.
估计表面法向量对于各种应用,包括表面重建和渲染至关重要。虽然现有的法向量估计方法在缓慢变化的区域表现良好,但在快速变化的区域往往会导致失败。为了解决这一问题,我们提出了一种 novel 的方法称为 MSECNet,该方法通过将法向量变化建模视为边缘检测问题来改进在法向量变化区域的估计。MSECNet 由主链网络和多尺度边缘适应(MSEC)流组成。MSEC 流通过多尺度特征融合和自适应边缘检测实现 robust 的边缘检测。检测到的边缘然后用主链网络的输出加上边缘适应模块来产生边缘意识表示。广泛的实验表明,MSECNet 在合成(PCPNet)和实际(SceneNN)数据集上比现有方法表现更好,同时运行速度更快。我们还进行了各种分析,以研究 MSEC 流中每个组件的贡献。最后,我们证明了我们在表面重建中方法的有效性。
https://arxiv.org/abs/2308.02237
Multispectral imagery is frequently incorporated into agricultural tasks, providing valuable support for applications such as image segmentation, crop monitoring, field robotics, and yield estimation. From an image segmentation perspective, multispectral cameras can provide rich spectral information, helping with noise reduction and feature extraction. As such, this paper concentrates on the use of fusion approaches to enhance the segmentation process in agricultural applications. More specifically, in this work, we compare different fusion approaches by combining RGB and NDVI as inputs for crop row detection, which can be useful in autonomous robots operating in the field. The inputs are used individually as well as combined at different times of the process (early and late fusion) to perform classical and DL-based semantic segmentation. In this study, two agriculture-related datasets are subjected to analysis using both deep learning (DL)-based and classical segmentation methodologies. The experiments reveal that classical segmentation methods, utilizing techniques such as edge detection and thresholding, can effectively compete with DL-based algorithms, particularly in tasks requiring precise foreground-background separation. This suggests that traditional methods retain their efficacy in certain specialized applications within the agricultural domain. Moreover, among the fusion strategies examined, late fusion emerges as the most robust approach, demonstrating superiority in adaptability and effectiveness across varying segmentation scenarios. The dataset and code is available at this https URL.
彩色图像常常融入农业生产任务中,为图像分割、作物监测、田间机器人和产量估计等应用提供了宝贵的支持。从图像分割的角度来看,彩色相机可以提供丰富的光谱信息,帮助减少噪声并提取特征。因此,本文重点探讨了融合方法在农业应用中的使用,具体而言,本研究将使用RGB和NDVI作为 crop row检测的输入,这在田间自主机器人操作中非常有用。输入分别使用单个或多个(早期和晚期融合)在过程中的不同时间进行组合,以进行经典和深度学习语义分割。在本研究中,两个与农业相关的数据集采用深度学习(DL)和经典分割方法进行了分析。实验表明,经典分割方法利用边缘检测和阈值等技术,可以 effectively与深度学习算法竞争,特别是在需要精确前景背景分离的任务中。这表明传统方法在农业领域的某些特定 specialized 应用中仍然具有有效性。此外,在研究中,晚期融合 emerged 成为最稳定的融合方法,在不同分割场景下的适应力和效果方面都表现出优越性。数据集和代码可在本网站上 https 可用。
https://arxiv.org/abs/2308.00159
Quantum computers possess the potential to process data using a remarkably reduced number of qubits compared to conventional bits, as per theoretical foundations. However, recent experiments have indicated that the practical feasibility of retrieving an image from its quantum encoded version is currently limited to very small image sizes. Despite this constraint, variational quantum machine learning algorithms can still be employed in the current noisy intermediate scale quantum (NISQ) era. An example is a hybrid quantum machine learning approach for edge detection. In our study, we present an application of quantum transfer learning for detecting cracks in gray value images. We compare the performance and training time of PennyLane's standard qubits with IBM's qasm\_simulator and real backends, offering insights into their execution efficiency.
量子计算机利用与传统二进制位相比显著减少的qubit来处理数据,根据理论基点。然而,最近的实验表明,从量子编码版本中恢复图像的实际可行性目前只局限于非常小的图像大小。尽管存在这种限制,Variational quantum machine learning算法仍然可以在当前的噪声中等规模的量子(NISQ)时代使用。一个示例是用于边缘检测的混合量子机器学习方法。在我们的研究中,我们介绍了量子转移学习的应用,用于检测灰度图像中的裂缝。我们比较了PennyLane的标准qubit的性能和训练时间,与IBM的qasm\_simulator和实际后端,提供了它们执行效率的洞察。
https://arxiv.org/abs/2307.16723
Relying on large-scale training data with pixel-level labels, previous edge detection methods have achieved high performance. However, it is hard to manually label edges accurately, especially for large datasets, and thus the datasets inevitably contain noisy labels. This label-noise issue has been studied extensively for classification, while still remaining under-explored for edge detection. To address the label-noise issue for edge detection, this paper proposes to learn Pixel-level NoiseTransitions to model the label-corruption process. To achieve it, we develop a novel Pixel-wise Shift Learning (PSL) module to estimate the transition from clean to noisy labels as a displacement field. Exploiting the estimated noise transitions, our model, named PNT-Edge, is able to fit the prediction to clean labels. In addition, a local edge density regularization term is devised to exploit local structure information for better transition learning. This term encourages learning large shifts for the edges with complex local structures. Experiments on SBD and Cityscapes demonstrate the effectiveness of our method in relieving the impact of label noise. Codes will be available at github.
依靠大规模的带像素标签的训练数据,以前的边缘检测方法取得了高性能。然而,手动标注边缘十分困难,特别是对于大型数据集,因此数据集不可避免地含有噪声标签。该标签噪声问题在分类中被深入研究,但边缘检测方面仍未被充分探索。为了解决边缘检测方面的标签噪声问题,本文提出学习像素级别的噪声transitions来建模标签 corruption 过程。为了实现这一目标,我们开发了一个新的像素级移动学习(PSL)模块,以估计从干净到噪声标签的迁移过程,并将其作为转移矩阵。利用估计的噪声迁移,我们构建了一个名为 PNT-Edge 的模型,它能够将预测匹配到干净标签上。此外,我们设计了一个 local edge density Regularization term,以利用 local 结构信息更好地进行迁移学习。这一 term 鼓励学习复杂的 local 结构下的大规模迁移。在 SBD 和城市景观数据集上的实验表明,我们的方法和减轻标签噪声影响的有效性。代码将在 GitHub 上提供。
https://arxiv.org/abs/2307.14070
The significance of multi-scale features has been gradually recognized by the edge detection community. However, the fusion of multi-scale features increases the complexity of the model, which is not friendly to practical application. In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model. CTFN includes two lightweight multi-scale feature fusion modules: a Semantic Enhancement Module (SEM) that can utilize the semantic information contained in coarse-scale features to guide the learning of fine-scale features, and a Pseudo Pixel-level Weighting (PPW) module that aggregate the complementary merits of multi-scale features by assigning weights to all features. Notwithstanding all this, the interference of texture noise makes the correct classification of some pixels still a challenge. For these hard samples, we propose a novel loss function, coined Dynamic Focal Loss, which reshapes the standard cross-entropy loss and dynamically adjusts the weights to correct the distribution of hard samples. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and BIPEDv2. Compared with state-of-the-art methods, CTFN achieves competitive accuracy with less parameters and computational cost. Apart from the backbone, CTFN requires only 0.1M additional parameters, which reduces its computation cost to just 60% of other state-of-the-art methods. The codes are available at this https URL.
多尺度特征的重要性逐渐得到了边缘检测社区的认可。然而,多尺度特征的集成增加了模型的复杂性,这对实际应用并不友好。在这个研究中,我们提出了一种紧凑的两次融合网络(CTFN),以 fully integrate multi-scale features 的同时保持模型的紧凑性。 CTFN 包括两个轻量级多尺度特征融合模块:一个语义增强模块(SEM),可以利用粗尺度特征中的语义信息指导细尺度特征的学习,还有一个伪像素水平权重模块(PPW),可以汇总多尺度特征的互补优点,并动态调整权重以纠正硬样本分布。我们对三个数据集(BSDS500、NYUDv2和BIPEDv2)进行了评估,与最先进的方法相比,CTFN 在参数和计算成本上都表现出了竞争力的精度。除了核心模型,CTFN只需要 0.1 百万额外的参数,将计算成本降低到最先进的方法的 60% 以下。代码在此 https URL 可用。
https://arxiv.org/abs/2307.04952
Existing edge-aware camouflaged object detection (COD) methods normally output the edge prediction in the early stage. However, edges are important and fundamental factors in the following segmentation task. Due to the high visual similarity between camouflaged targets and the surroundings, edge prior predicted in early stage usually introduces erroneous foreground-background and contaminates features for segmentation. To tackle this problem, we propose a novel Edge-aware Mirror Network (EAMNet), which models edge detection and camouflaged object segmentation as a cross refinement process. More specifically, EAMNet has a two-branch architecture, where a segmentation-induced edge aggregation module and an edge-induced integrity aggregation module are designed to cross-guide the segmentation branch and edge detection branch. A guided-residual channel attention module which leverages the residual connection and gated convolution finally better extracts structural details from low-level features. Quantitative and qualitative experiment results show that EAMNet outperforms existing cutting-edge baselines on three widely used COD datasets. Codes are available at this https URL.
现有的边缘aware掩模对象检测方法(COD)通常在早期阶段输出边缘预测。然而,边缘在后续分割任务中是重要且基本的因素。由于掩模目标与周围环境具有很高的视觉相似性,早期边缘预测通常会引入错误的主要背景和污染分割特征。为了解决这一问题,我们提出了一种新的边缘aware镜像网络(EAMNet),该网络将边缘检测和掩模对象分割视为交叉 refinement 过程。更具体地说,EAMNet具有两个分支架构,其中一种是由分割引起的边缘聚合模块和一种是由边缘引起的完整性聚合模块,旨在交叉指导分割分支和边缘检测分支。一种指导残余通道注意力模块,利用残余连接和门卷积最后更好地从低层次的特征中提取结构细节。定量和定性实验结果表明,EAMNet在三个广泛使用的 COD 数据集上优于现有的尖端基准。代码可在 this https URL 中找到。
https://arxiv.org/abs/2307.03932
Choosing how to encode a real-world problem as a machine learning task is an important design decision in machine learning. The task of glacier calving front modeling has often been approached as a semantic segmentation task. Recent studies have shown that combining segmentation with edge detection can improve the accuracy of calving front detectors. Building on this observation, we completely rephrase the task as a contour tracing problem and propose a model for explicit contour detection that does not incorporate any dense predictions as intermediate steps. The proposed approach, called ``Charting Outlines by Recurrent Adaptation'' (COBRA), combines Convolutional Neural Networks (CNNs) for feature extraction and active contour models for the delineation. By training and evaluating on several large-scale datasets of Greenland's outlet glaciers, we show that this approach indeed outperforms the aforementioned methods based on segmentation and edge-detection. Finally, we demonstrate that explicit contour detection has benefits over pixel-wise methods when quantifying the models' prediction uncertainties. The project page containing the code and animated model predictions can be found at \url{this https URL}.
选择如何将现实世界的问题编码为机器学习任务是机器学习中一个重要的设计决策。冰川堕胎口建模任务通常被视为语义分割任务。最近的研究表明,结合分割和边缘检测可以提高堕胎口检测的准确性。基于这一观察,我们将任务完全重新表述为轮廓追踪问题,并提出了一种 explicit 轮廓检测模型,该模型不包含任何密集预测作为中间步骤。我们提出的这种方法被称为“循环适应大纲编制”(COBRA),它结合了卷积神经网络(CNNs)进行特征提取和主动轮廓模型进行轮廓绘制。通过在 Greenland 的输出冰川多个大规模数据集上训练和评估,我们表明,这种方法确实比基于分割和边缘检测的方法表现更好。最后,我们证明了 explicit 轮廓检测在量化模型预测不确定性方面比像素级方法更有优势。包含代码和动画模型预测的项目页面可以在 url{this https URL} 找到。
https://arxiv.org/abs/2307.03461
Learning-based edge detection usually suffers from predicting thick edges. Through extensive quantitative study with a new edge crispness measure, we find that noisy human-labeled edges are the main cause of thick predictions. Based on this observation, we advocate that more attention should be paid on label quality than on model design to achieve crisp edge detection. To this end, we propose an effective Canny-guided refinement of human-labeled edges whose result can be used to train crisp edge detectors. Essentially, it seeks for a subset of over-detected Canny edges that best align human labels. We show that several existing edge detectors can be turned into a crisp edge detector through training on our refined edge maps. Experiments demonstrate that deep models trained with refined edges achieve significant performance boost of crispness from 17.4% to 30.6%. With the PiDiNet backbone, our method improves ODS and OIS by 12.2% and 12.6% on the Multicue dataset, respectively, without relying on non-maximal suppression. We further conduct experiments and show the superiority of our crisp edge detection for optical flow estimation and image segmentation.
基于学习的边缘检测通常容易出现预测厚实的边缘。通过进行全面定量研究并使用新的边缘 crispness 测量方法,我们发现人类标注的边缘噪声是导致厚实预测的主要原因。基于这一观察,我们提倡更加注重标签质量而不是模型设计,以实现清晰的边缘检测。为此,我们提出了一种有效的 Canny 指导的人类标注边缘的精加工方法,其结果可用于训练清晰边缘检测器。本质上,它寻求 over- detected Canny 边缘的特定子集,这些边缘最好与人类标签对齐。我们展示了通过训练我们的精加工边缘映射,可以将多个现有的边缘检测器转化为清晰边缘检测器。实验表明,使用精加工的边缘映射训练的深度模型,在多指示数据集上从17.4%提高到了30.6%的清晰性能Boost。基于 PiDiNet 骨干网络,我们的方法在 Multicue 数据集上分别提高了ODS和OIS12.2%和12.6%。此外,我们进一步进行了实验,并展示了我们清晰边缘检测对于光学流估计和图像分割的优越性。
https://arxiv.org/abs/2306.15172
The field of 'explainable' artificial intelligence (XAI) has produced highly cited methods that seek to make the decisions of complex machine learning (ML) methods 'understandable' to humans, for example by attributing 'importance' scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.
解释型人工智能(XAI)领域产生了大量引用广泛的方法,旨在让复杂的机器学习(ML)方法对人类变得易于理解,例如将输入特征的重要性得分赋予输入特征。然而,缺乏正式支持使得从给定XAI方法的结果中得出安全结论变得不清楚,同时也妨碍了XAI方法的理论验证和实证验证。这意味着挑战性的非线性问题,通常由深度学习网络解决,目前缺乏适当的补救措施。在这里,我们创造了三个不同的非线性分类基准数据集,其中设计已知的重要类条件特征作为实际正确解释。使用新的定量指标,我们基准了多种XAI方法在不同深度学习模型架构中的解释性能。我们表明,流行的XAI方法往往无法显著地优于随机性能基准和边缘检测方法。此外,我们证明,从不同模型架构得出的解释可以非常不同;因此,即使在控制条件下,也可能被误解。
https://arxiv.org/abs/2306.12816
Visual question answering (VQA) is a Multidisciplinary research problem that pursued through practices of natural language processing and computer vision. Visual question answering automatically answers natural language questions according to the content of an image. Some testing questions require external knowledge to derive a solution. Such knowledge-based VQA uses various methods to retrieve features of image and text, and combine them to generate the answer. To generate knowledgebased answers either question dependent or image dependent knowledge retrieval methods are used. If knowledge about all the objects in the image is derived, then not all knowledge is relevant to the question. On other side only question related knowledge may lead to incorrect answers and over trained model that answers question that is irrelevant to image. Our proposed method takes image attributes and question features as input for knowledge derivation module and retrieves only question relevant knowledge about image objects which can provide accurate answers.
视觉问答(VQA)是一个多学科的研究问题,通过自然语言处理和计算机视觉的实践来实现。视觉问答自动根据图像的内容回答自然语言问题。一些测试问题需要外部知识来解决问题。这种基于知识的VQA使用各种方法提取图像和文本的特征,将它们组合起来生成答案。生成知识 Based 答案使用问题依赖或图像依赖的知识提取方法。如果所有图像对象的知识都可以得到提取,则不是所有的知识都与问题相关。另一方面,只有与问题相关的知识可能导致不正确的答案和过度训练的模型,回答与图像无关的问题。我们提出的方法将图像属性和问题属性作为知识生成模块的输入,提取有关图像对象的问题相关知识,可以提供准确的答案。
https://arxiv.org/abs/2306.04938
To incorporate spatial (neighborhood) and bidirectional hierarchical relationships as well as features and priors of the samples into their classification, we formulated the classification problem on three variants of multiresolution neighborhood graphs and the graph of a hierarchical conditional random field. Each of these graphs was weighted and undirected and could thus incorporate the spatial or hierarchical relationships in all directions. In addition, each variant of the proposed neighborhood graphs was composed of a spatial feature-based subgraph and an aspatial prior-based subgraph. It expanded on a random walker graph by using novel mechanisms to derive the edge weights of its spatial feature-based subgraph. These mechanisms included implicit and explicit edge detection to enhance detection of weak boundaries between different classes in spatial domain. The implicit edge detection relied on the outlier detection capability of the Tukey's function and the classification reliabilities of the samples estimated by a hierarchical random forest classifier. Similar mechanism was used to derive the edge weights and thus the energy function of the hierarchical conditional random field. This way, the classification problem boiled down to a system of linear equations and a minimization of the energy function which could be done via fast and efficient techniques.
将空间(邻居)和双向等级关系以及样本的特征和先验融入分类中,我们提出了多个多分辨率邻居Graph和等级条件随机域Graph的分类问题。每个Graph都有权重和无向扩展,因此可以在所有方向上包含空间或等级关系。此外,每个提议的邻居Graph都是基于空间特征的子Graph和基于空间先验的子Graph。它通过使用新的机制,从随机漫步Graph中扩展出基于空间特征的子edge weights。这些机制包括隐含和显式的边检测,以增强在空间域中不同类之间的弱边界检测。隐含边检测依赖于 Tukey 函数的异常检测能力,以及由等级随机森林分类器估计的样本分类可靠性。类似地,这些机制用于推导基于空间特征的子edge weights,以及等级条件随机域的能量函数。因此,分类问题简化为线性方程组和能量函数最小化,可以通过快速和高效的技术完成。
https://arxiv.org/abs/2306.02143
Animal pose estimation has become a crucial area of research, but the scarcity of annotated data is a significant challenge in developing accurate models. Synthetic data has emerged as a promising alternative, but it frequently exhibits domain discrepancies with real data. Style transfer algorithms have been proposed to address this issue, but they suffer from insufficient spatial correspondence, leading to the loss of label information. In this work, we present a new approach called Synthetic Pose-aware Animal ControlNet (SPAC-Net), which incorporates ControlNet into the previously proposed Prior-Aware Synthetic animal data generation (PASyn) pipeline. We leverage the plausible pose data generated by the Variational Auto-Encoder (VAE)-based data generation pipeline as input for the ControlNet Holistically-nested Edge Detection (HED) boundary task model to generate synthetic data with pose labels that are closer to real data, making it possible to train a high-precision pose estimation network without the need for real data. In addition, we propose the Bi-ControlNet structure to separately detect the HED boundary of animals and backgrounds, improving the precision and stability of the generated data. Using the SPAC-Net pipeline, we generate synthetic zebra and rhino images and test them on the AP10K real dataset, demonstrating superior performance compared to using only real images or synthetic data generated by other methods. Our work demonstrates the potential for synthetic data to overcome the challenge of limited annotated data in animal pose estimation.
动物姿态估计已经成为一个重要的研究领域,但缺乏标注数据是一个开发准确模型的重要挑战。合成数据已经成为一个有前途的选择,但它常常与真实数据存在域差。样式迁移算法已经被用来解决这个问题,但它们通常缺乏空间对应关系,导致标签信息丢失。在这项工作中,我们提出了一种新的方法,称为合成姿势aware动物控制Net(SPAC-Net),它将控制Net纳入之前提出的先验合成动物数据生成(PASYN)管道。我们利用基于变分自编码器(VAE)的数据生成管道生成的合理姿态数据作为控制NetHolistically-nested Edge Detection(HED)边界任务模型的输入,生成更接近真实数据的合成数据,使得无需使用真实数据就可以训练高精度姿态估计网络。此外,我们提出了双控制Net结构,分别检测动物和背景的HED边界,提高了生成数据的准确性和稳定性。使用S PAC-Net管道,我们生成了合成斑马和犀牛图像,并在AP10K真实数据集上进行了测试,证明了与仅使用真实图像或其他方法生成的合成数据相比,我们的工作表现更好。我们的工作展示了合成数据克服动物姿态估计中标注数据有限挑战的潜力。
https://arxiv.org/abs/2305.17845
We develop two novel vision methods for planning effective grasps for clear plastic bags, as well as a control method to enable a Sawyer arm with a parallel gripper to execute the grasps. The first vision method is based on classical image processing and heuristics (e.g., Canny edge detection) to select a grasp target and angle. The second uses a deep-learning model trained on a human-labeled data set to mimic human grasp decisions. A clustering algorithm is used to de-noise the outputs of each vision method. Subsequently, a workspace PD control method is used to execute each grasp. Of the two vision methods, we find the deep-learning based method to be more effective.
我们开发了两种新的视觉方法,用于规划针对透明塑料袋的有效抓握,以及一种控制方法,使使用并行抓手的割草机手臂能够执行抓握。第一种视觉方法是基于经典的图像处理和启发式方法(例如Canny边缘检测)来选择抓握目标和角度。第二种使用了一个基于人类标签数据集的深度学习模型来模拟人类抓握决策。使用聚类算法来噪声化每个视觉方法的输出,随后使用Workspace PD控制方法执行每个抓握。在这两种视觉方法中,我们发现基于深度学习的方法更为有效。
https://arxiv.org/abs/2305.07631
Segment anything model (SAM) developed by Meta AI Research has recently attracted significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is capable of segmenting any object on a certain image. In the original SAM work, the authors turned to zero-short transfer tasks (like edge detection) for evaluating the performance of SAM. Recently, numerous works have attempted to investigate the performance of SAM in various scenarios to recognize and segment objects. Moreover, numerous projects have emerged to show the versatility of SAM as a foundation model by combining it with other models, like Grounding DINO, Stable Diffusion, ChatGPT, etc. With the relevant papers and projects increasing exponentially, it is challenging for the readers to catch up with the development of SAM. To this end, this work conducts the first yet comprehensive survey on SAM. This is an ongoing project and we intend to update the manuscript on a regular basis. Therefore, readers are welcome to contact us if they complete new works related to SAM so that we can include them in our next version.
Meta AI Research开发的Segment anything模型(Sam)最近引起了广泛关注。训练在超过10亿个口罩的大规模分割数据集上,Sam能够在特定的图像中分割任何物体。在Sam的原始研究中,作者转向了类似于边缘检测的零级迁移任务来评估Sam的性能。最近,许多工作试图研究Sam在各种场景下的性能,以识别和分割物体。此外,许多项目涌现,试图通过结合其他模型,如支撑Dino、稳定扩散、ChatGPT等,展示Sam作为基础模型的多功能性。随着相关论文和项目数量呈指数级增长,让读者跟上Sam的发展变得越来越困难。为此,我们将进行Sam的第一个但全面的综述。这是一个正在进行中的项目,我们计划定期更新论文。因此,如果您完成与Sam相关的新工作,欢迎与我们联系,以便我们可以将其纳入我们的未来版本。
https://arxiv.org/abs/2306.06211