Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capabilities in real-world applications. A novel smoothness property of a sequence is defined on the first- and second TGD. This property is used to denoise one-dimensional signals, where the noise is the non-smooth points in the sequence. Meanwhile, the center of the gradient in a finite interval can be accurately location via TGD calculation. This solves a traditional challenge in computer vision, which is the precise localization of image edges with noise robustness. Furthermore, the power of TGD operators extends to spatio-temporal edge detection in three-dimensional arrays, enabling the identification of kinetic edges in video data. These diverse applications highlight the properties of TGD in discrete domain and the significant promise of TGD for the computation across signal processing, image analysis, and video analytic.
数值差分计算是现代数字时代的一个核心和不可或缺的理论。Tao一般差分(TGD)是一种新理论和新方法,用于处理多维空间中离散序列和阵列的差分计算。TGD操作基于有限间隔中一般差分的理论基础,在现实应用中表现出卓越的信号处理能力。在第一和第二TGD中定义了一个序列的平滑性质。这个性质被用来去噪一维信号,其中噪声是序列中的非平滑点。同时,通过TGD计算可以准确地定位有限间隔中的梯度中心。这解决了传统计算机视觉中的一个挑战,即在噪声抗性的图像边缘精确定位。此外,TGD操作的力量还扩展到三维数组中的时空边缘检测,能够识别视频数据中的运动边缘。这些多样应用突出了TGD在离散领域的性质以及TGD在信号处理、图像分析和视频分析领域具有的显著前景。
https://arxiv.org/abs/2401.15287
This review presents various image segmentation methods using complex networks. Image segmentation is one of the important steps in image analysis as it helps analyze and understand complex images. At first, it has been tried to classify complex networks based on how it being used in image segmentation. In computer vision and image processing applications, image segmentation is essential for analyzing complex images with irregular shapes, textures, or overlapping boundaries. Advanced algorithms make use of machine learning, clustering, edge detection, and region-growing techniques. Graph theory principles combined with community detection-based methods allow for more precise analysis and interpretation of complex images. Hybrid approaches combine multiple techniques for comprehensive, robust segmentation, improving results in computer vision and image processing tasks.
本文介绍了使用复杂网络进行图像分割的方法。图像分割是图像分析的重要步骤,因为它有助于分析和理解复杂图像。首先,试图根据其在图像分割中的使用情况对复杂网络进行分类。在计算机视觉和图像处理应用中,图像分割对于分析复杂图像的形状不规则、纹理或重叠边界至关重要。高级算法利用机器学习、聚类、边缘检测和区域生长技术。社区检测方法与图论原理相结合,允许更精确地分析和解释复杂图像。混合方法结合多种技术进行全面的、稳健的分割,从而提高计算机视觉和图像处理任务的成果。
https://arxiv.org/abs/2401.02758
Edge detection is a fundamental technique in various computer vision tasks. Edges are indeed effectively delineated by pixel discontinuity and can offer reliable structural information even in textureless areas. State-of-the-art heavily relies on pixel-wise annotations, which are labor-intensive and subject to inconsistencies when acquired manually. In this work, we propose a novel self-supervised approach for edge detection that employs a multi-level, multi-homography technique to transfer annotations from synthetic to real-world datasets. To fully leverage the generated edge annotations, we developed SuperEdge, a streamlined yet efficient model capable of concurrently extracting edges at pixel-level and object-level granularity. Thanks to self-supervised training, our method eliminates the dependency on manual annotated edge labels, thereby enhancing its generalizability across diverse datasets. Comparative evaluations reveal that SuperEdge advances edge detection, demonstrating improvements of 4.9% in ODS and 3.3% in OIS over the existing STEdge method on BIPEDv2.
边缘检测是各种计算机视觉任务中的基本技术。事实上,通过像素不连续性有效地描绘边缘可以提供与纹理无关区域中的可靠结构信息。先进的边缘检测方法主要依赖于像素级的注释,这些注释工作量大且容易出现不一致。在本文中,我们提出了一种新颖的自监督边缘检测方法,该方法采用多级多同构技术将注释从合成数据集转移至现实世界数据集。为了充分利用生成的边缘注释,我们开发了SuperEdge,一种在像素级和物体级别上同时提取边缘的流线型高效模型。由于自监督训练,我们的方法消除了对手动注释边缘标签的依赖,从而提高了其对不同数据集的泛化能力。比较评估显示,SuperEdge在边缘检测方面取得了改进,其ODS提高了4.9%,OIS提高了3.3%,超过了现有STEdge方法。
https://arxiv.org/abs/2401.02313
Limited by the encoder-decoder architecture, learning-based edge detectors usually have difficulty predicting edge maps that satisfy both correctness and crispness. With the recent success of the diffusion probabilistic model (DPM), we found it is especially suitable for accurate and crisp edge detection since the denoising process is directly applied to the original image size. Therefore, we propose the first diffusion model for the task of general edge detection, which we call DiffusionEdge. To avoid expensive computational resources while retaining the final performance, we apply DPM in the latent space and enable the classic cross-entropy loss which is uncertainty-aware in pixel level to directly optimize the parameters in latent space in a distillation manner. We also adopt a decoupled architecture to speed up the denoising process and propose a corresponding adaptive Fourier filter to adjust the latent features of specific frequencies. With all the technical designs, DiffusionEdge can be stably trained with limited resources, predicting crisp and accurate edge maps with much fewer augmentation strategies. Extensive experiments on four edge detection benchmarks demonstrate the superiority of DiffusionEdge both in correctness and crispness. On the NYUDv2 dataset, compared to the second best, we increase the ODS, OIS (without post-processing) and AC by 30.2%, 28.1% and 65.1%, respectively. Code: this https URL.
受到编码器-解码器架构的限制,基于学习的边缘检测器通常很难同时预测满足正确性和清晰度的边缘图。随着扩散概率模型的(DPM)最近的成功,我们发现它特别适合准确和清晰的边缘检测,因为去噪过程直接应用于原始图像大小。因此,我们提出了第一个扩散模型,称为DiffusionEdge,用于一般边缘检测。为了在保留最终性能的同时节省计算资源,我们在潜在空间中应用DPM,并启用经典交叉熵损失,该损失在像素级别具有不确定性意识,以以蒸馏方式直接优化潜在空间中的参数。我们还采用了解耦架构,以加速去噪过程,并相应地自适应调整特定频率的潜在特征。通过所有这些技术设计,DiffusionEdge可以在有限的资源上稳定训练,预测具有许多减少增强策略的清晰和准确边缘图。在四个边缘检测基准上的大量实验都证明了DiffusionEdge在正确性和清晰度方面的优越性。在NYUDv2数据集上,与第二好的相比,我们分别增加了ODS、OIS(没有后处理)和AC 30.2%、28.1%和65.1%。代码:此链接。
https://arxiv.org/abs/2401.02032
The proposed architecture, Dual Attentive U-Net with Feature Infusion (DAU-FI Net), addresses challenges in semantic segmentation, particularly on multiclass imbalanced datasets with limited samples. DAU-FI Net integrates multiscale spatial-channel attention mechanisms and feature injection to enhance precision in object localization. The core employs a multiscale depth-separable convolution block, capturing localized patterns across scales. This block is complemented by a spatial-channel squeeze and excitation (scSE) attention unit, modeling inter-dependencies between channels and spatial regions in feature maps. Additionally, additive attention gates refine segmentation by connecting encoder-decoder pathways. To augment the model, engineered features using Gabor filters for textural analysis, Sobel and Canny filters for edge detection are injected guided by semantic masks to expand the feature space strategically. Comprehensive experiments on a challenging sewer pipe and culvert defect dataset and a benchmark dataset validate DAU-FI Net's capabilities. Ablation studies highlight incremental benefits from attention blocks and feature injection. DAU-FI Net achieves state-of-the-art mean Intersection over Union (IoU) of 95.6% and 98.8% on the defect test set and benchmark respectively, surpassing prior methods by 8.9% and 12.6%, respectively. Ablation studies highlight incremental benefits from attention blocks and feature injection. The proposed architecture provides a robust solution, advancing semantic segmentation for multiclass problems with limited training data. Our sewer-culvert defects dataset, featuring pixel-level annotations, opens avenues for further research in this crucial domain. Overall, this work delivers key innovations in architecture, attention, and feature engineering to elevate semantic segmentation efficacy.
提出的架构,双重关注U-Net与特征注入(DAU-FI Net),解决了在有限样本的语义分割数据集中出现的挑战。DAU-FI Net整合了多尺度空间通道关注机制和特征注入,以提高物体定位的精度。核心采用了多尺度深度可分离卷积模块,捕捉到尺度下的局部模式。这个模块由一个多尺度深度卷积和激活(scSE)关注单元补充,建模特征图通道和空间区域之间的相互依赖关系。此外,自适应注意力门通过连接编码器-解码器路径来优化分割。为了增加模型,使用Gabor滤波器提取文本分析特征,Sobel和Canny滤波器进行边缘检测的工程特征,通过语义掩码引导注入,扩展了特征空间。对具有挑战性的污水管道和干沟缺陷数据集以及基准数据集的全面实验证明,DAU-FI Net的性能优越。消融研究强调了自适应注意力和特征注入的增量益处。DAU-FI Net在缺陷测试集和基准数据集上分别实现了95.6%和98.8%的IoU,比之前方法分别提高了8.9%和12.6%。消融研究强调了自适应注意力和特征注入的增量益处。所提出的架构为带有有限训练数据的多样类问题提供了一个稳健的解决方案,提高了语义分割的效果。我们的污水管道和干沟缺陷数据集,具有像素级别的标注,为这个关键领域进一步的研究提供了途径。总的来说,这项工作在架构、注意力和特征工程方面取得了关键创新,提高了语义分割的有效性。
https://arxiv.org/abs/2312.14053
The cable-based arrestment systems are integral to the launch and recovery of aircraft onboard carriers and on expeditionary land-based installations. These modern arrestment systems rely on various mechanisms to absorb energy from an aircraft during an arrestment cycle to bring the aircraft to a full stop. One of the primary components of this system is the cable interface to the engine. The formation of slack in the cable at this interface can result in reduced efficiency and drives maintenance efforts to remove the slack prior to continued operations. In this paper, a machine vision based slack detection system is presented. A situational awareness camera is utilized to collect video data of the cable interface region, machine vision algorithms are applied to reduce noise, remove background clutter, focus on regions of interest, and detect changes in the image representative of slack formations. Some algorithms employed in this system include bilateral image filters, least squares polynomial fit, Canny Edge Detection, K-Means clustering, Gaussian Mixture-based Background/Foreground Segmentation for background subtraction, Hough Circle Transforms, and Hough line Transforms. The resulting detections are filtered and highlighted to create an indication to the shipboard operator of the presence of slack and a need for a maintenance action. A user interface was designed to provide operators with an easy method to redefine regions of interest and adjust the methods to specific locations. The algorithms were validated on shipboard footage and were able to accurately identify slack with minimal false positives.
基于电缆的抓取系统是板上和岸上飞机和探险陆地设施发射和回收的重要组成部分。这些现代抓取系统依赖于各种机制在抓取周期中吸收来自飞机的能源,将飞机停下来。该系统的一个主要组成部分是电缆与发动机的接口。该接口处的电缆松弛可能会导致效率降低,并且在继续操作前需要进行维护。在本文中,我们提出了一个基于机器视觉的抓取检测系统。 该系统采用了一个情境意识相机来收集电缆接口区域的视频数据,并应用了机器视觉算法来降低噪声、去除背景噪音、将注意力集中在感兴趣区域,并检测到抓取形式的变化。 系统中采用的一些算法包括双边图像滤波器、最小二乘多项式拟合、Canny边缘检测、K-Means聚类、高斯混合基于背景/前景分割、Hough圆圈变换和Hough线变换。 检测结果经过滤波和突出以向船员提供有关是否存在抓取和需要维护行动的指示。 为了使操作员能容易地重新定义关注区域并调整方法,设计了一个用户界面。 算法在船上的录像带上进行了验证,能够准确地检测到抓取,且假阳性率较低。
https://arxiv.org/abs/2312.02320
Diffusion-based image synthesis has attracted extensive attention recently. In particular, ControlNet that uses image-based prompts exhibits powerful capability in image tasks such as canny edge detection and generates images well aligned with these prompts. However, vanilla ControlNet generally requires extensive training of around 5000 steps to achieve a desirable control for a single task. Recent context-learning approaches have improved its adaptability, but mainly for edge-based tasks, and rely on paired examples. Thus, two important open issues are yet to be addressed to reach the full potential of ControlNet: (i) zero-shot control for certain tasks and (ii) faster adaptation for non-edge-based tasks. In this paper, we introduce a novel Meta ControlNet method, which adopts the task-agnostic meta learning technique and features a new layer freezing design. Meta ControlNet significantly reduces learning steps to attain control ability from 5000 to 1000. Further, Meta ControlNet exhibits direct zero-shot adaptability in edge-based tasks without any finetuning, and achieves control within only 100 finetuning steps in more complex non-edge tasks such as Human Pose, outperforming all existing methods. The codes is available in this https URL.
扩散图像合成最近引起了广泛的关注。特别是,使用图像提示的控制网络在图像任务(如检测强边缘和生成与这些提示相符的图像)表现出强大的能力。然而,普通的控制网络通常需要训练约5000步才能实现对单一任务的理想控制。最近的功能扩展方法提高了其适应性,但主要针对基于边缘的任务,并且依赖于成对示例。因此,还需要解决两个重要的问题,才能发挥控制网络的全部潜力:(i)对于某些任务实现零散控制,(ii)对于非基于边缘的任务实现更快的适应。在本文中,我们引入了一种新颖的元控制网络方法,它采用任务无关的元学习技术并具有一个新的冻结层设计。元控制网络显著减少了从5000步降低到1000步的学习步骤,以实现控制能力。此外,元控制网络在边缘基于任务上直接具有零散适应能力,在更复杂的非边缘任务(如人体姿态)上,在仅需要100步微调的情况下,实现控制,超过了所有现有方法。代码可在此https:// URL中获取。
https://arxiv.org/abs/2312.01255
Since American Sign Language (ASL) has no standard written form, Deaf signers frequently share videos in order to communicate in their native language. However, since both hands and face convey critical linguistic information in signed languages, sign language videos cannot preserve signer privacy. While signers have expressed interest, for a variety of applications, in sign language video anonymization that would effectively preserve linguistic content, attempts to develop such technology have had limited success, given the complexity of hand movements and facial expressions. Existing approaches rely predominantly on precise pose estimations of the signer in video footage and often require sign language video datasets for training. These requirements prevent them from processing videos 'in the wild,' in part because of the limited diversity present in current sign language video datasets. To address these limitations, our research introduces DiffSLVA, a novel methodology that utilizes pre-trained large-scale diffusion models for zero-shot text-guided sign language video anonymization. We incorporate ControlNet, which leverages low-level image features such as HED (Holistically-Nested Edge Detection) edges, to circumvent the need for pose estimation. Additionally, we develop a specialized module dedicated to capturing facial expressions, which are critical for conveying essential linguistic information in signed languages. We then combine the above methods to achieve anonymization that better preserves the essential linguistic content of the original signer. This innovative methodology makes possible, for the first time, sign language video anonymization that could be used for real-world applications, which would offer significant benefits to the Deaf and Hard-of-Hearing communities. We demonstrate the effectiveness of our approach with a series of signer anonymization experiments.
由于美国手语(ASL)没有标准的书面形式,聋哑人经常分享视频以沟通他们的母语。然而,由于手和脸在手语中传达关键的语言信息,手语视频无法保留发言者的隐私。尽管发言者表示有兴趣,但对于各种应用,手语视频匿名化技术在保留语言内容方面取得了有限的成功,因为手势和面部表情的复杂性。现有的方法主要依赖于对发言者在视频镜头中的精确姿态估计,通常需要手语视频数据集进行训练。这些要求使得它们无法在野中处理视频,部分原因是由于当前手语视频数据集中存在的多样性有限。为了克服这些限制,我们的研究引入了DiffSLVA,一种利用预训练的大规模扩散模型进行零散文本指导的手语视频匿名化方法。我们引入了ControlNet,利用低级图像特征如HED(全层次边缘检测)边缘,绕过了姿态估计的需求。此外,我们还开发了一个专门用于捕捉面部表情的模块,这对于在手语中传达必要语言信息至关重要。然后,我们将上述方法结合在一起,实现了对原始发言者更好地保留关键语言内容的匿名化。这种创新方法使得,对于聋哑人和听力受损者来说,首次可以实现手语视频匿名化,这将给他们带来显著的好处。我们通过一系列手语匿名化实验来证明我们方法的效力。
https://arxiv.org/abs/2311.16060
Depth estimation from a single image is a challenging problem in computer vision because binocular disparity or motion information is absent. Whereas impressive performances have been reported in this area recently using end-to-end trained deep neural architectures, as to what cues in the images that are being exploited by these black box systems is hard to know. To this end, in this work, we quantify the relative contributions of the known cues of depth in a monocular depth estimation setting using an indoor scene data set. Our work uses feature extraction techniques to relate the single features of shape, texture, colour and saturation, taken in isolation, to predict depth. We find that the shape of objects extracted by edge detection substantially contributes more than others in the indoor setting considered, while the other features also have contributions in varying degrees. These insights will help optimise depth estimation models, boosting their accuracy and robustness. They promise to broaden the practical applications of vision-based depth estimation. The project code is attached to the supplementary material and will be published on GitHub.
从单个图像中进行深度估计是一个在计算机视觉领域具有挑战性的问题,因为缺乏双目差异或运动信息。相比之下,最近使用端到端训练的深度神经架构在這個領域取得了令人印象深刻的表現,但是這些黑盒系統利用的圖像中的提示是很难知道的。因此,在本文中,我們使用一組室內場景數據集來 quantify在單目深度估计設置中已知深度提示的相對貢獻。我們的 work 使用特徵提取技術將單一特徵的形狀、紋理、色彩和飽和度之間的關係與預測深度建立聯繫。我們發現,從邊緣檢測中提取的對象的形狀對室內環境中的其他特徵來說有更大的貢獻,而其他特徵在其他程度上也有貢獻。這些見解將有助於優化深度估計模型,提高其準確性和稳健性。它們有望拓宽基於視覺深度的應用範圍。項目代碼已附於補充材料中,並將发表在GitHub上。
https://arxiv.org/abs/2311.10042
This paper introduces an approach to enhance seismic fault recognition through self-supervised pretraining. Seismic fault interpretation holds great significance in the fields of geophysics and geology. However, conventional methods for seismic fault recognition encounter various issues, including dependence on data quality and quantity, as well as susceptibility to interpreter subjectivity. Currently, automated fault recognition methods proposed based on small synthetic datasets experience performance degradation when applied to actual seismic data. To address these challenges, we have introduced the concept of self-supervised learning, utilizing a substantial amount of relatively easily obtainable unlabeled seismic data for pretraining. Specifically, we have employed the Swin Transformer model as the core network and employed the SimMIM pretraining task to capture unique features related to discontinuities in seismic data. During the fine-tuning phase, inspired by edge detection techniques, we have also refined the structure of the Swin-UNETR model, enabling multiscale decoding and fusion for more effective fault detection. Experimental results demonstrate that our proposed method attains state-of-the-art performance on the Thebe dataset, as measured by the OIS and ODS metrics.
这篇论文提出了一种通过自监督预训练来增强地震断层识别的方法。地震断层识别在地质和地球物理学领域具有重要的意义。然而,传统的地震断层识别方法遇到了各种问题,包括对数据质量和数量的不确定性,以及受到解读者主观性的影响。目前,基于小合成数据集的自动断层识别方法在实际地震数据上应用时,表现出了性能下降的趋势。为了应对这些挑战,我们引入了自监督学习的概念,利用大量相对容易获得的未标记地震数据进行预训练。具体来说,我们采用了Swin Transformer模型作为核心网络,并使用SimMIM预训练任务来捕捉地震数据中与断层相关的独特特征。在微调阶段,受到边缘检测技术启发的,我们还对Swin-UNETR模型的结构进行了优化,实现了多尺度解码和融合,以提高故障检测的有效性。实验结果表明,我们提出的方法在Thebe数据集上取得了与最先进方法相同的性能,这是通过OIS和ODS指标来衡量的。
https://arxiv.org/abs/2310.17974
Since NASA put forward the concept of the digital twin in 2010, many industries have put forward the dynamic goal of digital development, and the transportation industry is also among them. With more and more companies laying out on this virgin land, the digital twin transportation industry has grown rapidly and gradually formed a complete scientific research system. However, under the largely mature framework, there are still many loophole problems that need to be solved. In the process of constructing a road network with point cloud information, we summarize several major features of the point cloud collected by laser scanners and analyze the potential problems of constructing the network, such as misjudging the feature points as ground points and grid voids. On this basis, we reviewed relevant literature and proposed targeted solutions, such as building a point cloud pyramid modeled after the image pyramid, expanding the virtual grid, etc., applying CSF for ground-point cloud extraction, and constructing a road network model using the PTD (progressive density-based filter) algorithm. For the problem of road sign detection, we optimize the remote sensing data in the ground point cloud by enhancing the information density using edge detection, improving the data quality by removing the low intensity points, and achieving 90% accuracy of road text recognition using PaddleOCR and Densenet. As for the real-time digital twin traffic, we design the P2PRN network using the backbone of MPR-GAN for 2D feature generation and SuperGlue for 2D feature matching, rendering the viewpoints according to the matching optimization points, completing the multimodal matching task after several iterations, and successfully calculating the road camera position with 10° and 15m accuracy.
自2010年以来,由于NASA提出了数字孪生的概念,许多行业都提出了数字化发展的动态目标,交通运输业也不例外。随着越来越多的公司将目光投向这一片未开发的土地,数字孪生运输行业快速发展并逐渐形成了一个完整的科学研究系统。然而,在成熟的大框架下,仍然存在许多需要解决的问题。在用点云信息构建道路网络的过程中,我们总结了激光扫描器收集的点云的主要特点,并分析了构建网络的可能问题,例如错误地将特征点判断为地面点和网格空白。在此基础上,我们回顾了相关文献,提出了针对性的解决方案,例如建立一个像图像金字塔一样的点云金字塔模型,扩大虚拟网格等。应用CSF进行地面点云提取,并使用PTD(渐进密度滤波)算法构建道路网络模型。对于道路标志检测问题,我们通过增强边缘检测来提高地面点云中的信息密度,通过消除低强度点来提高数据质量,使用PaddleOCR和Densenet实现90%的道路文本识别准确率。对于实时数字孪生交通,我们使用MPR-GAN的骨干网络进行2D特征生成,使用SuperGlue进行2D特征匹配,根据匹配优化点计算观点,完成多次迭代后成功计算出道路摄像机位置,实现10°和15m的精度。
https://arxiv.org/abs/2310.14296
Quantum Sobel edge detection (QSED) is a kind of algorithm for image edge detection using quantum mechanism, which can solve the real-time problem encountered by classical algorithms. However, the existing QSED algorithms only consider two- or four-direction Sobel operator, which leads to a certain loss of edge detail information in some high-definition images. In this paper, a novel QSED algorithm based on eight-direction Sobel operator is proposed, which not only reduces the loss of edge information, but also simultaneously calculates eight directions' gradient values of all pixel in a quantum image. In addition, the concrete quantum circuits, which consist of gradient calculation, non-maximum suppression, double threshold detection and edge tracking units, are designed in details. For a 2^n x 2^n image with q gray scale, the complexity of our algorithm can be reduced to O(n^2 + q^2), which is lower than other existing classical or quantum algorithms. And the simulation experiment demonstrates that our algorithm can detect more edge information, especially diagonal edges, than the two- and four-direction QSED algorithms.
量子Sobel边缘检测(QSED)是一种使用量子机制进行图像边缘检测的算法,可以解决传统算法所面临的实时问题。然而,现有的QSED算法仅考虑两或四方向Sobel operator,导致在一些高分辨率图像中某些边缘细节信息一定程度的丢失。在本文中,我们提出了一种基于八方向Sobel operator的新QSED算法,不仅减少了边缘信息的损失,还同时计算量子图像所有像素的八个方向梯度值。此外,我们详细设计了包含梯度计算、最大抑制、双阈值检测和边缘跟踪单元的具体量子电路。对于2^n x 2^n图像,我们的算法的复杂度可降至O(n^2 + q^2),比其他任何现有的经典或量子算法都要低。模拟实验表明,我们的算法能够检测更多的边缘信息,特别是对角线边缘,比两或四方向QSED算法检测到的信息更多。
https://arxiv.org/abs/2310.03037
Generating geometric 3D reconstructions from Neural Radiance Fields (NeRFs) is of great interest. However, accurate and complete reconstructions based on the density values are challenging. The network output depends on input data, NeRF network configuration and hyperparameter. As a result, the direct usage of density values, e.g. via filtering with global density thresholds, usually requires empirical investigations. Under the assumption that the density increases from non-object to object area, the utilization of density gradients from relative values is evident. As the density represents a position-dependent parameter it can be handled anisotropically, therefore processing of the voxelized 3D density field is justified. In this regard, we address geometric 3D reconstructions based on density gradients, whereas the gradients result from 3D edge detection filters of the first and second derivatives, namely Sobel, Canny and Laplacian of Gaussian. The gradients rely on relative neighboring density values in all directions, thus are independent from absolute magnitudes. Consequently, gradient filters are able to extract edges along a wide density range, almost independent from assumptions and empirical investigations. Our approach demonstrates the capability to achieve geometric 3D reconstructions with high geometric accuracy on object surfaces and remarkable object completeness. Notably, Canny filter effectively eliminates gaps, delivers a uniform point density, and strikes a favorable balance between correctness and completeness across the scenes.
生成神经网络辐射场(NeRF)的几何3D重构非常感兴趣。然而,基于密度值的准确和完整的重构是具有挑战性的。网络输出取决于输入数据、NeRF网络配置和超参数。因此,直接使用密度值,例如通过滤波使用全局密度阈值,通常需要经验调查。假设密度从非物体到物体区域增加,显然可以使用相对密度梯度。因为密度代表一个位置依赖性参数,可以处理 anisotropic 的,因此处理立方晶格密度场是合理的。在这方面,我们讨论基于密度梯度的几何3D重构,而梯度来自第一和第二阶导数的3D边缘检测滤波,即Sobel、卡尼和高斯洛伦兹。梯度依赖于相对相邻密度值在所有方向,因此与绝对值无关。因此,梯度过滤器能够提取沿广泛的密度范围的边缘,几乎与假设和经验调查无关。我们的方法展示了能力在物体表面和显著物体完整性的几何3D重构上实现高几何精度。值得注意的是,卡尼滤波有效地消除间隙,提供均匀的点密度,并 strike 正确性和完整性在场景之间的有利平衡。
https://arxiv.org/abs/2309.14800
Efficient polyp segmentation in healthcare plays a critical role in enabling early diagnosis of colorectal cancer. However, the segmentation of polyps presents numerous challenges, including the intricate distribution of backgrounds, variations in polyp sizes and shapes, and indistinct boundaries. Defining the boundary between the foreground (i.e. polyp itself) and the background (surrounding tissue) is difficult. To mitigate these challenges, we propose Multi-Scale Edge-Guided Attention Network (MEGANet) tailored specifically for polyp segmentation within colonoscopy images. This network draws inspiration from the fusion of a classical edge detection technique with an attention mechanism. By combining these techniques, MEGANet effectively preserves high-frequency information, notably edges and boundaries, which tend to erode as neural networks deepen. MEGANet is designed as an end-to-end framework, encompassing three key modules: an encoder, which is responsible for capturing and abstracting the features from the input image, a decoder, which focuses on salient features, and the Edge-Guided Attention module (EGA) that employs the Laplacian Operator to accentuate polyp boundaries. Extensive experiments, both qualitative and quantitative, on five benchmark datasets, demonstrate that our EGANet outperforms other existing SOTA methods under six evaluation metrics. Our code is available at \url{this https URL}
在医疗保健中,有效的 polyp 分割在早期诊断 colorectal cancer 中发挥着关键作用。然而, polyp 分割面临着许多挑战,包括背景的精细分布、 polyp 大小和形状的变化以及模糊的边界。定义前景(即 polyp 本身)和背景(周围组织)之间的边界并不容易。为了缓解这些挑战,我们提出了 Multi-Scale Edge-Guided Attention Network (MEGANet) 专门为 colonoscopies 图像中的 polyp 分割而设计。该网络从经典的边缘检测技术和注意力机制的融合中汲取灵感。通过结合这些技术,MegaNet 有效地保留了高频信息,特别是边缘和边界,这些边缘通常会随着神经网络的深度而磨损。MegaNet 设计为一个完整的端到端框架,包括三个关键模块:编码器,负责从输入图像中提取和抽象特征;解码器,专注于引人注目的特征;以及 Edge-Guided Attention module (EGA),采用高斯函数操作来加强 polyp 边界。在五个基准数据集上进行了大量的定性和定量实验,证明了我们的 EGANet 在六个评估指标上优于其他现有 SOTA 方法。我们的代码可供参考于 \url{this https URL}。
https://arxiv.org/abs/2309.03329
This paper presents Deep Networks for Improved Segmentation Edges (DeNISE), a novel data enhancement technique using edge detection and segmentation models to improve the boundary quality of segmentation masks. DeNISE utilizes the inherent differences in two sequential deep neural architectures to improve the accuracy of the predicted segmentation edge. DeNISE applies to all types of neural networks and is not trained end-to-end, allowing rapid experiments to discover which models complement each other. We test and apply DeNISE for building segmentation in aerial images. Aerial images are known for difficult conditions as they have a low resolution with optical noise, such as reflections, shadows, and visual obstructions. Overall the paper demonstrates the potential for DeNISE. Using the technique, we improve the baseline results with a building IoU of 78.9%.
本论文介绍了改进分割边缘的深度学习方法(DeNISE),一种利用边缘检测和分割模型提高分割掩膜边界质量的全新的数据增强技术。DeNISE利用两个连续深度学习架构之间的固有差异来提高预测分割边缘的精度。DeNISE适用于各种类型的神经网络,不需要整个网络进行训练,因此能够迅速实验以确定哪些模型互相补充。我们测试和应用了DeNISE在航空图像中进行分割。航空图像因具有低分辨率和光学噪声(如反射、阴影和视觉障碍)而著称,难以处理。总体而言,本文展示了DeNISE的潜力。利用该技术,我们取得了78.9%的建移IoU提高。
https://arxiv.org/abs/2309.02091
We introduce a novel approach for image edge detection based on pseudo-Boolean polynomials for image patches. We show that patches covering edge regions in the image result in pseudo-Boolean polynomials with higher degrees compared to patches that cover blob regions. The proposed approach is based on reduction of polynomial degree and equivalence properties of penalty-based pseudo-Boolean polynomials.
我们介绍了一种基于伪布尔多项式的图像边缘检测新方法,该方法适用于图像点片。我们证明了在图像中覆盖边缘区域的点片会导致更高的伪布尔多项式度数,而覆盖blob区域的点片则会导致度数较低的伪布尔多项式。该提议的方法基于多项式度数的减少和基于惩罚的伪布尔多项式等价性特性。
https://arxiv.org/abs/2308.15557
We introduce a deterministic approach to edge detection and image segmentation by formulating pseudo-Boolean polynomials on image patches. The approach works by applying a binary classification of blob and edge regions in an image based on the degrees of pseudo-Boolean polynomials calculated on patches extracted from the provided image. We test our method on simple images containing primitive shapes of constant and contrasting colour and establish the feasibility before applying it to complex instances like aerial landscape images. The proposed method is based on the exploitation of the reduction, polynomial degree, and equivalence properties of penalty-based pseudo-Boolean polynomials.
我们介绍了一种确定性的方法,用于边缘检测和图像分割,方法是通过在图像点云上定义伪布尔多项式来实现的。该方法通过在给定图像点的点云上计算伪布尔多项式的度数,以确定图像中blob和边缘区域的二进制分类。在测试简单图像中,包含恒定和对比颜色的基本形状,并确定其可行性之前,我们将其应用于类似航空景观图像的复杂实例。该提议的方法基于利用惩罚基的伪布尔多项式的减少、polynomial degree和等价性质。
https://arxiv.org/abs/2308.15453
Edge detection, as a core component in a wide range of visionoriented tasks, is to identify object boundaries and prominent edges in natural images. An edge detector is desired to be both efficient and accurate for practical use. To achieve the goal, two key issues should be concerned: 1) How to liberate deep edge models from inefficient pre-trained backbones that are leveraged by most existing deep learning methods, for saving the computational cost and cutting the model size; and 2) How to mitigate the negative influence from noisy or even wrong labels in training data, which widely exist in edge detection due to the subjectivity and ambiguity of annotators, for the robustness and accuracy. In this paper, we attempt to simultaneously address the above problems via developing a collaborative learning based model, termed PEdger. The principle behind our PEdger is that, the information learned from different training moments and heterogeneous (recurrent and non recurrent in this work) architectures, can be assembled to explore robust knowledge against noisy annotations, even without the help of pre-training on extra data. Extensive ablation studies together with quantitative and qualitative experimental comparisons on the BSDS500 and NYUD datasets are conducted to verify the effectiveness of our design, and demonstrate its superiority over other competitors in terms of accuracy, speed, and model size. Codes can be found at this https URL.
边缘检测作为视觉任务中的核心组件,旨在在自然图像中识别物体边界和突出的边缘。为了实现目标,需要关注两个关键问题:1) 如何从不同的训练时刻和异构(在本文中为持续和非持续)架构中解放深度边缘模型,以减少计算成本和减小模型大小,以节省计算资源和提高模型鲁棒性;2) 如何减轻训练数据中的噪声或甚至错误标签的负面影响,由于标注员的主观性和不确定性,这种现象在边缘检测中非常普遍,以增强性和准确性为目的。在本文中,我们试图通过开发一种协同学习基于模型的方法,称为PEdger,同时解决上述问题。我们的PEdger原则是,从不同的训练时刻和异构(在本文中为持续和非持续)架构中学习的信息可以组装起来,探索对抗噪声标注的稳健知识,即使没有额外的训练数据的帮助。进行了广泛的削减研究和对BSDS500和NYU-D数据集的定量和定性实验比较,以验证我们的设计的有效性,并证明在准确性、速度和模型大小方面优于其他竞争对手。代码可在本URL找到。
https://arxiv.org/abs/2308.14084
This paper proposes a novel zero-shot edge detection with SCESAME, which stands for Spectral Clustering-based Ensemble for Segment Anything Model Estimation, based on the recently proposed Segment Anything Model (SAM). SAM is a foundation model for segmentation tasks, and one of the interesting applications of SAM is Automatic Mask Generation (AMG), which generates zero-shot segmentation masks of an entire image. AMG can be applied to edge detection, but suffers from the problem of overdetecting edges. Edge detection with SCESAME overcomes this problem by three steps: (1) eliminating small generated masks, (2) combining masks by spectral clustering, taking into account mask positions and overlaps, and (3) removing artifacts after edge detection. We performed edge detection experiments on two datasets, BSDS500 and NYUDv2. Although our zero-shot approach is simple, the experimental results on BSDS500 showed almost identical performance to human performance and CNN-based methods from seven years ago. In the NYUDv2 experiments, it performed almost as well as recent CNN-based methods. These results indicate that our method has the potential to be a strong baseline for future zero-shot edge detection methods. Furthermore, SCESAME is not only applicable to edge detection, but also to other downstream zero-shot tasks.
这篇文章提出了一种新的零样本边缘检测方法,名为SCESAME,其含义是基于最近提出的Segment Anything Model(SAM)的Spectral Clustering-based Ensemble,用于分割任意模型估计。SAM是一个用于分割任务的基元模型,其中SAM的一个有趣的应用是自动分割掩码生成(AMG),该方法生成整个图像的零样本分割掩码。AMG可以应用于边缘检测,但存在过度检测边缘的问题。通过三个步骤克服了这个问题:(1)消除生成的小掩码;(2)通过Spectral Clustering结合掩码,考虑掩码的位置和重叠;(3)在边缘检测后删除痕迹。我们在两个数据集上进行了边缘检测实验,分别是BSDS500和NYUDv2。尽管我们的零样本方法很简单,在BSDS500上的实验结果与七年前人类表现和CNN方法几乎相同。在NYUDv2上,它几乎与最近的CNN方法表现相同。这些结果表明,我们的方法有成为未来零样本边缘检测方法的强大基准的潜力。此外,SCESAME不仅可以应用于边缘检测,还可以应用于其他零样本后续任务。
https://arxiv.org/abs/2308.13779
The reconstruction of textureless areas has long been a challenging problem in MVS due to lack of reliable pixel correspondences between images. In this paper, we propose the Textureless-aware Segmentation And Correlative Refinement guided Multi-View Stereo (TSAR-MVS), a novel method that effectively tackles challenges posed by textureless areas in 3D reconstruction through filtering, refinement and segmentation. First, we implement joint hypothesis filtering, a technique that merges a confidence estimator with a disparity discontinuity detector to eliminate incorrect depth estimations. Second, to spread the pixels with confident depth, we introduce a iterative correlation refinement strategy that leverages RANSAC to generate superpixels, succeeded by a median filter for broadening the influence of accurately determined pixels.Finally, we present a textureless-aware segmentation method that leverages edge detection and line detection for accurately identify large textureless regions to be fitted using 3D planes. Experiments on extensive datasets demonstrate that our method significantly outperforms most non-learning methods and exhibits robustness to textureless areas while preserving fine details.
由于没有可靠的图像像素对应关系,MVS中的无纹理区域重建一直是一个挑战性的问题。在本文中,我们提出了一种无纹理aware Segmentation And Correlative refinement guided Multi-View Stereo(TSAR-MVS),一种新方法,通过滤波、精化以及分割,有效地解决了无纹理区域在3D重建中面临的挑战。首先,我们实现了联合假设滤波,该技术将信心估计与差异连续性检测相结合,以消除不正确的深度估计。其次,为了均匀分布具有信心深度的像素,我们引入了一种迭代相关 refinement策略,利用RANSAC生成超级像素,然后使用中值滤波以扩展准确确定的像素的影响。最后,我们提出了一种无纹理aware segmentation方法,利用边缘检测和线检测准确识别使用3D平面 fitting 大的无纹理区域。对大量数据集的实验表明,我们的方法 significantly outperforms 大多数非学习方法,并表现出对无纹理区域的稳健性,同时保留 fine details。
https://arxiv.org/abs/2308.09990