The rapid advancement of automated artificial intelligence algorithms and remote sensing instruments has benefited change detection (CD) tasks. However, there is still a lot of space to study for precise detection, especially the edge integrity and internal holes phenomenon of change features. In order to solve these problems, we design the Change Guiding Network (CGNet), to tackle the insufficient expression problem of change features in the conventional U-Net structure adopted in previous methods, which causes inaccurate edge detection and internal holes. Change maps from deep features with rich semantic information are generated and used as prior information to guide multi-scale feature fusion, which can improve the expression ability of change features. Meanwhile, we propose a self-attention module named Change Guide Module (CGM), which can effectively capture the long-distance dependency among pixels and effectively overcome the problem of the insufficient receptive field of traditional convolutional neural networks. On four major CD datasets, we verify the usefulness and efficiency of the CGNet, and a large number of experiments and ablation studies demonstrate the effectiveness of CGNet. We're going to open-source our code at this https URL.
自动人工智能算法和遥感仪的快速发展为变化检测(CD)任务带来了好处。然而,对于精确检测,尤其是在变化特征的边缘完整性和内部孔现象方面,还有很多需要研究的问题。为了解决这些问题,我们设计了一个名为Change Guiding Network(CGNet)的工具,用于解决传统U-Net结构中变化特征表达不足的问题,导致不准确的边缘检测和内部孔。从具有丰富语义信息的深度特征中生成并使用作为先验信息的变体,引导多尺度特征融合,从而提高变化特征的表达能力。同时,我们提出了一个名为Change Guide Module(CGM)的自注意力模块,可以有效地捕捉像素之间的长距离依赖关系,有效克服传统卷积神经网络的接收域不足的问题。在四个主要CD数据集上,我们验证了CGNet的可用性和效率,并通过大量的实验和消融研究证明了CGNet的有效性。我们将开源我们的代码到这个链接:https://github.com/your_username/change_guiding_network。
https://arxiv.org/abs/2404.09179
We propose a novel method for geolocalizing Unmanned Aerial Vehicles (UAVs) in environments lacking Global Navigation Satellite Systems (GNSS). Current state-of-the-art techniques employ an offline-trained encoder to generate a vector representation (embedding) of the UAV's current view, which is then compared with pre-computed embeddings of geo-referenced images to determine the UAV's position. Here, we demonstrate that the performance of these methods can be significantly enhanced by preprocessing the images to extract their edges, which exhibit robustness to seasonal and illumination variations. Furthermore, we establish that utilizing edges enhances resilience to orientation and altitude inaccuracies. Additionally, we introduce a confidence criterion for localization. Our findings are substantiated through synthetic experiments.
我们提出了一个用于在缺乏全球导航卫星系统(GNSS)的环境中定位自主飞行器(UAV)的新方法。目前最先进的技术使用离线训练的编码器生成UAV当前视图的向量表示(嵌入),然后将其与预计算的地理参考图像的嵌入进行比较,以确定UAV的位置。在这里,我们证明了这些方法的性能可以通过预处理图像来提取其边缘得到显著提高,这些边缘表现出对季节性和光照变化具有较强的鲁棒性。此外,我们还证明了利用边缘可以增强对方向和高度不准确性的抵抗力。此外,我们引入了一个用于定位的置信度标准。我们的研究结果通过仿真实验得到了证实。
https://arxiv.org/abs/2404.06207
Edge detection is a long standing problem in computer vision. Recent deep learning based algorithms achieve state of-the-art performance in publicly available datasets. Despite the efficiency of these algorithms, their performance, however, relies heavily on the pretrained weights of the backbone network on the ImageNet dataset. This limits heavily the design space of deep learning based edge detectors. Whenever we want to devise a new model, we have to train this new model on the ImageNet dataset first, and then fine tune the model using the edge detection datasets. The comparison would be unfair otherwise. However, it is usually not feasible for many researchers to train a model on the ImageNet dataset due to the limited computation resources. In this work, we study the performance that can be achieved by state-of-the-art deep learning based edge detectors in publicly available datasets when they are trained from scratch, and devise a new network architecture, the multi-stream and multi scale fusion net (msmsfnet), for edge detection. We show in our experiments that by training all models from scratch to ensure the fairness of comparison, out model outperforms state-of-the art deep learning based edge detectors in three publicly available datasets.
边缘检测是计算机视觉领域的一个长期存在的问题。最近基于深度学习的算法在公开数据集上实现了最先进的性能。然而,尽管这些算法具有高效性,但它们的性能仍然很大程度上依赖于ImageNet数据集的预训练权重。这大大限制了基于深度学习的边缘检测模型的设计空间。每次我们想要设计一个新模型时,都必须在ImageNet数据集上训练这个新模型,然后通过边缘检测数据集进行微调。否则,比较结果将不公平。然而,通常许多研究人员由于计算资源有限,无法在ImageNet数据集上训练模型。在这项工作中,我们研究了在从零开始训练最先进的基于深度学习的边缘检测算法时,它们在公开数据集上的性能,并设计了一个新的网络架构,即多流多尺度融合网络(msmsfnet)用于边缘检测。我们在实验中展示了,通过将所有模型从零开始训练,确保比较的公平性,我们的模型在三个公开数据集上的性能优于基于ImageNet数据集的最先进的边缘检测算法。
https://arxiv.org/abs/2404.04856
This paper presents a novel monocular depth estimation method, named ECFNet, for estimating high-quality monocular depth with clear edges and valid overall structure from a single RGB image. We make a thorough inquiry about the key factor that affects the edge depth estimation of the MDE networks, and come to a ratiocination that the edge information itself plays a critical role in predicting depth details. Driven by this analysis, we propose to explicitly employ the image edges as input for ECFNet and fuse the initial depths from different sources to produce the final depth. Specifically, ECFNet first uses a hybrid edge detection strategy to get the edge map and edge-highlighted image from the input image, and then leverages a pre-trained MDE network to infer the initial depths of the aforementioned three images. After that, ECFNet utilizes a layered fusion module (LFM) to fuse the initial depth, which will be further updated by a depth consistency module (DCM) to form the final estimation. Extensive experimental results on public datasets and ablation studies indicate that our method achieves state-of-the-art performance. Project page: this https URL.
本文提出了一种名为ECFNet的新单目深度估计方法,用于从单个RGB图像中估计高质量单目深度,具有清晰的边缘和有效的整体结构。我们对MDE网络边缘深度估计的关键因素进行了深入调查,得出的结论是边缘信息本身在预测深度细节中扮演了关键角色。基于这一分析,我们提出将图像边缘作为输入,并融合不同来源的初始深度,以产生最终深度的方法。具体来说,ECFNet首先使用混合边缘检测策略从输入图像中获取边缘图和边缘突出图像,然后利用预训练的MDE网络推断上述三个图像的初始深度。此后,ECFNet采用层叠融合模块(LFM)将初始深度进行融合,该模块将根据深度一致性模块(DCM)进一步更新以形成最终估计。对公开数据集的广泛实验结果和消融研究结果表明,我们的方法达到了最先进的性能水平。项目页面:https:// this URL。
https://arxiv.org/abs/2404.00373
Mask-guided matting networks have achieved significant improvements and have shown great potential in practical applications in recent years. However, simply learning matting representation from synthetic and lack-of-real-world-diversity matting data, these approaches tend to overfit low-level details in wrong regions, lack generalization to objects with complex structures and real-world scenes such as shadows, as well as suffer from interference of background lines or textures. To address these challenges, in this paper, we propose a novel auxiliary learning framework for mask-guided matting models, incorporating three auxiliary tasks: semantic segmentation, edge detection, and background line detection besides matting, to learn different and effective representations from different types of data and annotations. Our framework and model introduce the following key aspects: (1) to learn real-world adaptive semantic representation for objects with diverse and complex structures under real-world scenes, we introduce extra semantic segmentation and edge detection tasks on more diverse real-world data with segmentation annotations; (2) to avoid overfitting on low-level details, we propose a module to utilize the inconsistency between learned segmentation and matting representations to regularize detail refinement; (3) we propose a novel background line detection task into our auxiliary learning framework, to suppress interference of background lines or textures. In addition, we propose a high-quality matting benchmark, Plant-Mat, to evaluate matting methods on complex structures. Extensively quantitative and qualitative results show that our approach outperforms state-of-the-art mask-guided methods.
近年来,随着计算机图形学和计算机视觉领域的快速发展,遮罩引导的 matting 网络已经取得了显著的改进,并在实际应用中展示了巨大的潜力。然而,这些方法往往从合成和缺乏真实世界多样性的 matting 数据中学习 matting 表示,并倾向于在错误的区域过度拟合低级细节,缺乏对具有复杂结构和真实世界场景的对象的泛化,同时还受到背景线或纹理的干扰。为了应对这些挑战,在本文中,我们提出了一个名为“合议引导的遮罩指导模型”的新辅助学习框架,结合三个辅助任务:语义分割、边缘检测和背景线检测,从不同类型的数据和注释中学习不同的有效表示。我们的框架和模型包括以下关键方面: 1. 在真实世界场景中,为具有多样化和复杂结构的物体学习真实世界自适应语义表示,我们在更丰富的真实世界数据上进行语义分割和边缘检测,并具有分割注释; 2. 为了避免在低级细节上过拟合,我们引入了一个模块,利用学习和 matting 表示之间的不一致性来对细节进行规范; 3. 我们在辅助学习框架中引入了一个新的背景线检测任务,以抑制背景线或纹理的干扰。 此外,我们还提出了一个名为“植物-mat”的高质量 mat 基准,用于评估 mat 方法在复杂结构上的效果。广泛的定量结果和定性结果表明,我们的方法超越了最先进的遮罩引导方法。
https://arxiv.org/abs/2403.19213
Abstract Art is an immensely popular, discussed form of art that often has the ability to depict the emotions of an artist. Many researchers have made attempts to study abstract art in the form of edge detection, brush stroke and emotion recognition algorithms using machine and deep learning. This papers describes the study of a wide distribution of abstract paintings using Generative Adversarial Neural Networks(GAN). GANs have the ability to learn and reproduce a distribution enabling researchers and scientists to effectively explore and study the generated image space. However, the challenge lies in developing an efficient GAN architecture that overcomes common training pitfalls. This paper addresses this challenge by introducing a modified-DCGAN (mDCGAN) specifically designed for high-quality artwork generation. The approach involves a thorough exploration of the modifications made, delving into the intricate workings of DCGANs, optimisation techniques, and regularisation methods aimed at improving stability and realism in art generation enabling effective study of generated patterns. The proposed mDCGAN incorporates meticulous adjustments in layer configurations and architectural choices, offering tailored solutions to the unique demands of art generation while effectively combating issues like mode collapse and gradient vanishing. Further this paper explores the generated latent space by performing random walks to understand vector relationships between brush strokes and colours in the abstract art space and a statistical analysis of unstable outputs after a certain period of GAN training and compare its significant difference. These findings validate the effectiveness of the proposed approach, emphasising its potential to revolutionise the field of digital art generation and digital art ecosystem.
抽象艺术是一种非常受欢迎且备受讨论的绘画形式,通常具有描绘艺术家情感的能力。许多研究者使用机器和深度学习尝试研究抽象艺术,以边缘检测、笔触和情感识别算法的形式。本文描述了使用生成对抗网络(GAN)对广泛分布的抽象绘画进行研究。GAN具有学习和复制分布的能力,使研究人员和科学家能够有效探索和研究生成的图像空间。然而,挑战在于开发一个高效的GAN架构,克服常见的训练陷阱。本文通过引入一种专门为高质量艺术品生成而设计的modified-DCGAN(mDCGAN)来解决这一挑战。该方法深入探讨了所做修改,深入研究了DCGAN的复杂工作原理、优化技术和正则化方法,旨在提高艺术生成过程中的稳定性和真实性,从而有效研究生成模式。所提出的mDCGAN在层配置和架构选择方面进行了细致的调整,为艺术生成提供了针对性的解决方案,同时有效地解决了诸如模式收缩和梯度消失等问题。此外,本文通过进行随机的漫步来探索抽象艺术空间中笔触和颜色之间的向量关系,并对训练过程中一定期限内的不稳定输出进行统计分析,比较其与显著差别的差异。这些发现证实了所提出方法的有效性,强调了其对数字艺术生成领域和数字艺术生态系统的变革潜力。
https://arxiv.org/abs/2403.18397
Automatic measurements of tear meniscus height (TMH) have been achieved by using deep learning techniques; however, annotation is significantly influenced by subjective factors and is both time-consuming and labor-intensive. In this paper, we introduce an automatic TMH measurement technique based on edge detection-assisted annotation within a deep learning framework. This method generates mask labels less affected by subjective factors with enhanced efficiency compared to previous annotation approaches. For improved segmentation of the pupil and tear meniscus areas, the convolutional neural network Inceptionv3 was first implemented as an image quality assessment model, effectively identifying higher-quality images with an accuracy of 98.224%. Subsequently, by using the generated labels, various algorithms, including Unet, ResUnet, Deeplabv3+FcnResnet101, Deeplabv3+FcnResnet50, FcnResnet50, and FcnResnet101 were trained, with Unet demonstrating the best performance. Finally, Unet was used for automatic pupil and tear meniscus segmentation to locate the center of the pupil and calculate TMH,respectively. An evaluation of the mask quality predicted by Unet indicated a Mean Intersection over Union of 0.9362, a recall of 0.9261, a precision of 0.9423, and an F1-Score of 0.9326. Additionally, the TMH predicted by the model was assessed, with the fitting curve represented as y= 0.982x-0.862, an overall correlation coefficient of r^2=0.961 , and an accuracy of 94.80% (237/250). In summary, the algorithm can automatically screen images based on their quality,segment the pupil and tear meniscus areas, and automatically measure TMH. Measurement results using the AI algorithm demonstrate a high level of consistency with manual measurements, offering significant support to clinical doctors in diagnosing dry eye disease.
通过使用深度学习技术实现对 tears meniscus height (TMH) 的自动测量;然而,注释受到主观因素的影响,并且耗时且劳动密集。在本文中,我们介绍了一种基于深度学习框架的自动 TMH 测量技术。与之前注释方法相比,该方法生成的掩码标签受到主观因素的影响较小,效率更高。为了提高对瞳孔和泪液 meniscus 区域的分割,我们首先将卷积神经网络 Inceptionv3 实现为图像质量评估模型,有效识别出准确率高达 98.224% 的更高质量图像。接着,通过生成的标签,对各种算法进行训练,包括 Unet、ResUnet、Deeplabv3+FcnResnet101、Deeplabv3+FcnResnet50、FcnResnet50 和 FcnResnet101。其中,Unet 的表现最佳。最后,我们使用 Unet 对自动瞳孔和泪液 meniscus 区域进行分割,分别计算 TMH。使用 Unet 的预测掩码质量评估表明平均交集 over Union 为 0.9362,召回率为 0.9261,精确率为 0.9423,F1- 分数为 0.9326。此外,对模型的 TMH 预测进行评估,拟合曲线表示为 y = 0.982x - 0.862,相关系数 r^2 = 0.961,准确率高达 94.80%(237/250)。总之,该算法可以根据图像质量自动筛选图像,分割瞳孔和泪液 meniscus 区域,并自动测量 TMH。使用 AI 算法进行的测量结果与手动测量结果一致,为临床医生在诊断干眼病方面提供了有力的支持。
https://arxiv.org/abs/2403.15853
The maintenance, archiving and usage of the design drawings is cumbersome in physical form in different industries for longer period. It is hard to extract information by simple scanning of drawing sheets. Converting them to their digital formats such as Computer-Aided Design (CAD), with needed knowledge extraction can solve this problem. The conversion of these machine drawings to its digital form is a crucial challenge which requires advanced techniques. This research proposes an innovative methodology utilizing Deep Learning methods. The approach employs object detection model, such as Yolov7, Faster R-CNN, to detect physical drawing objects present in the images followed by, edge detection algorithms such as canny filter to extract and refine the identified lines from the drawing region and curve detection techniques to detect circle. Also ornaments (complex shapes) within the drawings are extracted. To ensure comprehensive conversion, an Optical Character Recognition (OCR) tool is integrated to identify and extract the text elements from the drawings. The extracted data which includes the lines, shapes and text is consolidated and stored in a structured comma separated values(.csv) file format. The accuracy and the efficiency of conversion is evaluated. Through this, conversion can be automated to help organizations enhance their productivity, facilitate seamless collaborations and preserve valuable design information in a digital format easily accessible. Overall, this study contributes to the advancement of CAD conversions, providing accurate results from the translating process. Future research can focus on handling diverse drawing types, enhanced accuracy in shape and line detection and extraction.
设计图的维护、归档和使用在不同的行业中存在形式上的复杂性,尤其是在较长时间内。通过简单的扫描图纸来提取信息非常困难。将它们转换为数字格式,如计算机辅助设计(CAD),如果需要知识提取,可以解决这个问题。将这些机器图纸转换为数字形式是一个关键的挑战,需要先进的技术。这项研究提出了利用深度学习方法的创新方法。该方法采用物体检测模型(如Yolov7、Faster R-CNN)来检测图像中的物理绘图物体,然后使用边缘检测算法(如canny滤波器)提取和优化已识别的线条和曲线检测技术(如圆检测)来检测圆。此外,还提取了图纸中的装饰物(复杂形状)。为了确保全面的转换,还集成了光学字符识别(OCR)工具,用于从设计图中识别和提取文本元素。提取的数据包括线条、形状和文本,以结构化的逗号分隔值(.csv)文件格式进行汇总。评估了转换的准确性和效率。通过这项研究,转换可以自动化,帮助组织提高生产力,促进无缝合作,并轻松地保存有价值的数字设计信息。总的来说,这项研究为CAD转换的发展做出了贡献,从翻译过程中提供了准确的结果。未来的研究可以关注处理不同类型的绘图,形状和线检测的准确度,以及提取。
https://arxiv.org/abs/2403.11291
We propose Texture Edge detection using Patch consensus (TEP) which is a training-free method to detect the boundary of texture. We propose a new simple way to identify the texture edge location, using the consensus of segmented local patch information. While on the boundary, even using local patch information, the distinction between textures are typically not clear, but using neighbor consensus give a clear idea of the boundary. We utilize local patch, and its response against neighboring regions, to emphasize the similarities and the differences across different textures. The step of segmentation of response further emphasizes the edge location, and the neighborhood voting gives consensus and stabilize the edge detection. We analyze texture as a stationary process to give insight into the patch width parameter verses the quality of edge detection. We derive the necessary condition for textures to be distinguished, and analyze the patch width with respect to the scale of textures. Various experiments are presented to validate the proposed model.
我们提出了一种名为Texture Edge检测的Patch共识方法(TEP),这是一种无需训练的检测纹理边界的训练-免费方法。我们提出了一种新的简单方法来确定纹理边缘位置,利用分割局部补丁信息的共识。即使在边界上,即使使用局部补丁信息,纹理之间的区别通常也不是很清晰,但是使用邻居共识可以明确地得到边界。我们利用局部补丁及其对邻近区域的响应来强调不同纹理之间的相似之处和差异。对于响应的分割步骤进一步强调了边缘位置,邻居投票可以达成共识并稳定边缘检测。我们将纹理视为静止过程,以探究纹理宽度参数与边缘检测质量之间的关系。我们推导了纹理被区分的必要条件,并分析了纹理宽度与纹理规模的关系。我们展示了各种实验来验证所提出的模型。
https://arxiv.org/abs/2403.11038
Federated learning (FL) empowers privacy-preservation in model training by only exposing users' model gradients. Yet, FL users are susceptible to the gradient inversion (GI) attack which can reconstruct ground-truth training data such as images based on model gradients. However, reconstructing high-resolution images by existing GI attack works faces two challenges: inferior accuracy and slow-convergence, especially when the context is complicated, e.g., the training batch size is much greater than 1 on each FL user. To address these challenges, we present a Robust, Accurate and Fast-convergent GI attack algorithm, called RAF-GI, with two components: 1) Additional Convolution Block (ACB) which can restore labels with up to 20% improvement compared with existing works; 2) Total variance, three-channel mEan and cAnny edge detection regularization term (TEA), which is a white-box attack strategy to reconstruct images based on labels inferred by ACB. Moreover, RAF-GI is robust that can still accurately reconstruct ground-truth data when the users' training batch size is no more than 48. Our experimental results manifest that RAF-GI can diminish 94% time costs while achieving superb inversion quality in ImageNet dataset. Notably, with a batch size of 1, RAF-GI exhibits a 7.89 higher Peak Signal-to-Noise Ratio (PSNR) compared to the state-of-the-art baselines.
联邦学习(FL)通过仅暴露用户的模型梯度来实现模型的隐私保护。然而,FL用户易受到梯度反向(GI)攻击的攻击,该攻击可以根据模型梯度重构训练数据,如图像。然而,通过现有的GI攻击重构高分辨率图像面临着两个挑战:准确性和收敛速度,尤其是在复杂背景下,例如每个FL用户的训练批量大小远大于1。为了应对这些挑战,我们提出了一个鲁棒、准确且收敛速度快的GI攻击算法,称为RAF-GI,包含两个组件:1)附加卷积层(ACB),它可以比现有工作最多提高20%的标签恢复;2)总方差,三个通道的mEan和cCanny边缘检测正则化项(TEA),这是一种白盒攻击策略,用于根据ACB推断的标签重构图像。此外,RAF-GI具有鲁棒性,即使在用户训练批量大小不超过48时,仍能准确地重构地面真实数据。我们的实验结果表明,RAF-GI可以在ImageNet数据集上减少94%的时间开销,同时具有出色的逆向质量。值得注意的是,当批量为1时,RAF-GI显示出比最先进的基准模型高出7.89倍的峰值信号-噪声比(PSNR)。
https://arxiv.org/abs/2403.08383
As a new distributed computing framework that can protect data privacy, federated learning (FL) has attracted more and more attention in recent years. It receives gradients from users to train the global model and releases the trained global model to working users. Nonetheless, the gradient inversion (GI) attack reflects the risk of privacy leakage in federated learning. Attackers only need to use gradients through hundreds of thousands of simple iterations to obtain relatively accurate private data stored on users' local devices. For this, some works propose simple but effective strategies to obtain user data under a single-label dataset. However, these strategies induce a satisfactory visual effect of the inversion image at the expense of higher time costs. Due to the semantic limitation of a single label, the image obtained by gradient inversion may have semantic errors. We present a novel gradient inversion strategy based on canny edge detection (MGIC) in both the multi-label and single-label datasets. To reduce semantic errors caused by a single label, we add new convolution layers' blocks in the trained model to obtain the image's multi-label. Through multi-label representation, serious semantic errors in inversion images are reduced. Then, we analyze the impact of parameters on the difficulty of input image reconstruction and discuss how image multi-subjects affect the inversion performance. Our proposed strategy has better visual inversion image results than the most widely used ones, saving more than 78% of time costs in the ImageNet dataset.
作为一个新兴的分布式计算框架,保护数据隐私的联邦学习(FL)近年来吸引了越来越多的关注。它从用户那里接收梯度以训练全局模型,然后将训练好的全局模型发布给工作用户。然而,梯度反向(GI)攻击反映了在联邦学习中隐私泄露的风险。攻击者只需通过成千上万个简单的迭代使用梯度来获取存储在用户本地设备上的相对准确的用户数据。为此,一些工作提出了简单的但有效的策略来在单标签数据集中获取用户数据。然而,这些策略在提高图像反向效果的同时,导致了更高的时间开销。由于单标签数据的语义限制,获得的图像可能存在语义错误。我们提出了一个基于Canny边缘检测(MGIC)的多标签和单标签数据集的新颖梯度反向策略。为了减少由单标签引起的语义错误,我们在训练模型中添加了新的卷积层片段以获取图像的多个标签。通过多标签表示,降低了反向图像中的严重语义错误。然后,我们分析了参数对输入图像重构难度的影响,并讨论了图像多学科户如何影响反向性能。我们提出的方法在ImageNet数据集上的图像反向图像结果优于最广泛使用的方法,节省了超过78%的时间开销。
https://arxiv.org/abs/2403.08284
Detecting edges in images suffers from the problems of (P1) heavy imbalance between positive and negative classes as well as (P2) label uncertainty owing to disagreement between different annotators. Existing solutions address P1 using class-balanced cross-entropy loss and dice loss and P2 by only predicting edges agreed upon by most annotators. In this paper, we propose RankED, a unified ranking-based approach that addresses both the imbalance problem (P1) and the uncertainty problem (P2). RankED tackles these two problems with two components: One component which ranks positive pixels over negative pixels, and the second which promotes high confidence edge pixels to have more label certainty. We show that RankED outperforms previous studies and sets a new state-of-the-art on NYUD-v2, BSDS500 and Multi-cue datasets. Code is available at this https URL.
在图像中检测边缘存在以下问题:(P1)正负类别之间的严重不平衡;(P2)由于不同注释者之间的分歧,导致标签不确定性。现有的解决方案通过类别平衡交叉熵损失和Dice损失来解决(P1)问题,而通过仅预测由大多数注释者达成一致的边缘来解决(P2)问题。在本文中,我们提出 RankED,一种基于统一排名的方法,解决了(P1)和(P2)问题。RankED 使用两个组件来解决这些问题:一个组件对负类像素进行排名,另一个组件通过促进高置信度边缘像素具有更多标签确定性来解决(P2)问题。我们证明了 RankED 超越了以前的研究,并在 NYUD-v2、BSDS500 和 Multi-cue 数据集上取得了最先进的水平。代码可在此处下载:https://url.com/
https://arxiv.org/abs/2403.01795
Accurate segmentation of COVID-19 CT images is crucial for reducing the severity and mortality rates associated with COVID-19 infections. In response to blurred boundaries and high variability characteristic of lesion areas in COVID-19 CT images, we introduce CDSE-UNet: a novel UNet-based segmentation model that integrates Canny operator edge detection and a dual-path SENet feature fusion mechanism. This model enhances the standard UNet architecture by employing the Canny operator for edge detection in sample images, paralleling this with a similar network structure for semantic feature extraction. A key innovation is the Double SENet Feature Fusion Block, applied across corresponding network layers to effectively combine features from both image paths. Moreover, we have developed a Multiscale Convolution approach, replacing the standard Convolution in UNet, to adapt to the varied lesion sizes and shapes. This addition not only aids in accurately classifying lesion edge pixels but also significantly improves channel differentiation and expands the capacity of the model. Our evaluations on public datasets demonstrate CDSE-UNet's superior performance over other leading models, particularly in segmenting large and small lesion areas, accurately delineating lesion edges, and effectively suppressing noise
准确分割 COVID-19 CT 图像对降低由 COVID-19 感染引起的严重程度和死亡率至关重要。为应对 COVID-19 CT 图像中模糊的边界和高变异性特征区域,我们引入了 CDSE-UNet:一种新颖的 UNet 分割模型,它整合了 Canny 操作符边缘检测和双路径 SENet 特征融合机制。通过在样本图像中使用 Canny 操作符进行边缘检测,并与同构网络结构进行 semantic 特征提取,从而增强 standard UNet 架构。关键创新是应用 Double SENet Feature Fusion Block,跨相应网络层有效地结合图像路径的特征。此外,我们还开发了多尺度卷积方法,用 UNet 的标准卷积替换,以适应各种病变尺寸和形状。这一改进不仅有助于准确分类病变边缘像素,而且显著提高了通道差异,扩展了模型的容量。我们在公共数据集上的评估表明,CDSE-UNet 的性能超过其他领先模型,尤其是在分割大和小型病变区域、准确描绘病变边缘和有效抑制噪声方面。
https://arxiv.org/abs/2403.01513
Edge detection as a pre-processing stage is a fundamental and important aspect of the number plate extraction system. This is due to the fact that the identification of a particular vehicle is achievable using the number plate because each number plate is unique to a vehicle. As such, the characters of a number plate system that differ in lines and shapes can be extracted using the principle of edge detection. This paper presents a method of number plate extraction using edge detection technique. Edges in number plates are identified with changes in the intensity of pixel values. Therefore, these edges are identified using a single based pixel or collection of pixel-based approach. The efficiency of these approaches of edge detection algorithms in number plate extraction in both noisy and clean environment are experimented. Experimental results are achieved in MATLAB 2017b using the Pratt Figure of Merit (PFOM) as a performance metric
将边缘检测作为预处理阶段是数字路牌提取系统的关键和重要方面。这是因为通过数字路牌可以实现对特定车辆的识别,因为每个数字路牌都是独一无二的。因此,使用边缘检测原理可以提取不同形状和线性的数字路牌系统中的字符。本文介绍了一种使用边缘检测技术进行数字路牌提取的方法。通过改变像素值的强度来识别数字路牌中的边缘。因此,这些边缘使用基于像素的单个或像素集合方法进行识别。在数字路牌提取的嘈杂和干净环境中,这些边缘检测算法的效率进行了实验验证。实验结果是在MATLAB 2017b中使用普特图性能度量(PFOM)取得的。
https://arxiv.org/abs/2402.18251
This work focuses on using advanced techniques for structural health monitoring (SHM) for bridges with Traffic. We propose an approach using deep reinforcement learning (DRL)-based control for Unmanned Aerial Vehicle (UAV). Our approach conducts a concrete bridge deck survey while traffic is ongoing and detects cracks. The UAV performs the crack detection, and the location of cracks is initially unknown. We use two edge detection techniques. First, we use canny edge detection for crack detection. We also use a Convolutional Neural Network (CNN) for crack detection and compare it with canny edge detection. Transfer learning is applied using CNN with pre-trained weights obtained from a crack image dataset. This enables the model to adapt and improve its performance in identifying and localizing cracks. Proximal Policy Optimization (PPO) is applied for UAV control and bridge surveys. The experimentation across various scenarios is performed to evaluate the performance of the proposed methodology. Key metrics such as task completion time and reward convergence are observed to gauge the effectiveness of the approach. We observe that the Canny edge detector offers up to 40\% lower task completion time, while the CNN excels in up to 12\% better damage detection and 1.8 times better rewards.
此工作重点关注使用先进的结构健康监测(SHM)技术对交通量较高的桥梁进行监测。我们提出了使用深度强化学习(DRL)进行无人机(UAV)控制的方案。我们的方法在交通进行时进行混凝土桥面调查并检测裂缝。无人机进行裂缝检测,裂缝的位置最初是不知道的。我们使用了两种边缘检测技术。首先,我们使用canny边缘检测进行裂缝检测。此外,我们还使用卷积神经网络(CNN)进行裂缝检测,并将其与canny边缘检测进行比较。通过应用预训练的权重,使用CNN进行迁移学习,使得模型能够适应并提高其在识别和定位裂缝方面的性能。 对于无人机控制和桥面调查,我们应用了局部策略优化(PPO)。通过在各种场景中进行实验,以评估所提出的方法的有效性。我们观察到,canny边缘检测的平均任务完成时间可降低40%,而CNN在提高至12%的损伤检测和1.8倍奖励方面表现出色。
https://arxiv.org/abs/2402.14757
Light plays a vital role in vision either human or machine vision, the perceived color is always based on the lighting conditions of the surroundings. Researchers are working to enhance the color detection techniques for the application of computer vision. They have implemented proposed several methods using different color detection approaches but still, there is a gap that can be filled. To address this issue, a color detection method, which is based on a Convolutional Neural Network (CNN), is proposed. Firstly, image segmentation is performed using the edge detection segmentation technique to specify the object and then the segmented object is fed to the Convolutional Neural Network trained to detect the color of an object in different lighting conditions. It is experimentally verified that our method can substantially enhance the robustness of color detection in different lighting conditions, and our method performed better results than existing methods.
光在无论是人还是机器视觉中扮演着至关重要的角色,感知到的颜色始终基于周围环境的照明条件。研究人员正在努力增强计算机视觉应用中的颜色检测技术。他们提出了几种使用不同颜色检测方法实现的方法,但仍存在一个可以填补的空白。为了解决这个问题,我们提出了一个基于卷积神经网络(CNN)的颜色检测方法。首先,使用边缘检测分割技术进行图像分割,以便指定物体,然后将分割的物体输入到训练以检测不同光照条件下物体颜色的卷积神经网络中。经过实验验证,我们的方法可以在不同光照条件下显著增强颜色检测的鲁棒性,并且我们的方法的表现优于现有方法。
https://arxiv.org/abs/2402.04762
Machine vision and image processing are often used with sensors for situation awareness in autonomous systems, from industrial robots to self-driving cars. The 3D depth sensors, such as LiDAR (Light Detection and Ranging), Radar, are great invention for autonomous systems. Due to the complexity of the setup, LiDAR may not be suitable for some operational environments, for example, a space environment. This study was motivated by a desire to get real-time volumetric and change information with multiple 2D cameras instead of a depth camera. Two cameras were used to measure the dimensions of a rectangular object in real-time. The R-C-P (row-column-pixel) method is developed using image processing and edge detection. In addition to the surface areas, the R-C-P method also detects discontinuous edges or volumes. Lastly, experimental work is presented for illustration of the R-C-P method, which provides the equations for calculating surface area dimensions. Using the equations with given distance information between the object and the camera, the vision system provides the dimensions of actual objects.
机器视觉和图像处理通常与传感器一起用于自主系统的环境感知,从工业机器人到自动驾驶汽车。3D深度传感器(如激光雷达,雷达)是自主系统的伟大发明。由于设置的复杂性,激光雷达可能不适合某些操作环境,例如空间环境。本研究旨在通过使用多个2D相机来获取实时的体积和变化信息,而不是使用深度相机。两个相机用于实时测量一个矩形物体的尺寸。利用图像处理和边缘检测,开发了R-C-P(行-列-像素)方法。除了表面面积,R-C-P方法还检测到不连续的边缘或体积。最后,以实验工作为例,展示了R-C-P方法,该方法提供了计算表面面积大小的方程。利用给定的物体与相机之间的距离信息,视觉系统提供了实际对象的尺寸。
https://arxiv.org/abs/2308.10058
Recently, there have been tremendous efforts in developing lightweight Deep Neural Networks (DNNs) with satisfactory accuracy, which can enable the ubiquitous deployment of DNNs in edge devices. The core challenge of developing compact and efficient DNNs lies in how to balance the competing goals of achieving high accuracy and high efficiency. In this paper we propose two novel types of convolutions, dubbed \emph{Pixel Difference Convolution (PDC) and Binary PDC (Bi-PDC)} which enjoy the following benefits: capturing higher-order local differential information, computationally efficient, and able to be integrated with existing DNNs. With PDC and Bi-PDC, we further present two lightweight deep networks named \emph{Pixel Difference Networks (PiDiNet)} and \emph{Binary PiDiNet (Bi-PiDiNet)} respectively to learn highly efficient yet more accurate representations for visual tasks including edge detection and object recognition. Extensive experiments on popular datasets (BSDS500, ImageNet, LFW, YTF, \emph{etc.}) show that PiDiNet and Bi-PiDiNet achieve the best accuracy-efficiency trade-off. For edge detection, PiDiNet is the first network that can be trained without ImageNet, and can achieve the human-level performance on BSDS500 at 100 FPS and with $<$1M parameters. For object recognition, among existing Binary DNNs, Bi-PiDiNet achieves the best accuracy and a nearly $2\times$ reduction of computational cost on ResNet18. Code available at \href{this https URL}{this https URL}.
近年来,在轻量化的Deep Neural Networks(DNNs)的开发中,已经取得了巨大的进展,这些DNNs具有令人满意的精度,可以实现分布式部署到边缘设备。开发轻量且高效的DNN的核心挑战在于如何平衡实现高准确性和高效率的竞争目标。在本文中,我们提出了两种新的卷积操作,称为\emph{Pixel Difference Convolution(PDC)和Binary PDC(Bi-PDC)},它们具有以下优点:捕获更高阶局部差分信息,计算高效,并能够与现有的DNN集成。通过PDC和Bi-PDC,我们进一步提出了两个轻量化的深度网络,分别称为\emph{Pixel Difference Networks(PiDiNet)和Binary PiDiNet(Bi-PiDiNet)},用于学习包括边缘检测和物体识别在内的视觉任务的准确且高效的表示。在流行的数据集(如BSDS500、ImageNet、LFW、YTF等)上进行的大量实验证明,PiDiNet和Bi-PiDiNet在准确性和效率之间实现了最佳平衡。对于边缘检测,PiDiNet是第一个在没有ImageNet训练的训练过程中达到人类水平性能的网络,同时可以在BSDS500上以100 FPS的速度实现1M参数以下的超人类性能。对于物体识别,在现有的二进制DNN中,Bi-PiDiNet实现了最佳准确性和与ResNet18相当2倍计算成本的降低。代码可在此处下载:\href{this <https://this URL>}。
https://arxiv.org/abs/2402.00422
Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capabilities in real-world applications. A novel smoothness property of a sequence is defined on the first- and second TGD. This property is used to denoise one-dimensional signals, where the noise is the non-smooth points in the sequence. Meanwhile, the center of the gradient in a finite interval can be accurately location via TGD calculation. This solves a traditional challenge in computer vision, which is the precise localization of image edges with noise robustness. Furthermore, the power of TGD operators extends to spatio-temporal edge detection in three-dimensional arrays, enabling the identification of kinetic edges in video data. These diverse applications highlight the properties of TGD in discrete domain and the significant promise of TGD for the computation across signal processing, image analysis, and video analytic.
数值差分计算是现代数字时代的一个核心和不可或缺的理论。Tao一般差分(TGD)是一种新理论和新方法,用于处理多维空间中离散序列和阵列的差分计算。TGD操作基于有限间隔中一般差分的理论基础,在现实应用中表现出卓越的信号处理能力。在第一和第二TGD中定义了一个序列的平滑性质。这个性质被用来去噪一维信号,其中噪声是序列中的非平滑点。同时,通过TGD计算可以准确地定位有限间隔中的梯度中心。这解决了传统计算机视觉中的一个挑战,即在噪声抗性的图像边缘精确定位。此外,TGD操作的力量还扩展到三维数组中的时空边缘检测,能够识别视频数据中的运动边缘。这些多样应用突出了TGD在离散领域的性质以及TGD在信号处理、图像分析和视频分析领域具有的显著前景。
https://arxiv.org/abs/2401.15287
This review presents various image segmentation methods using complex networks. Image segmentation is one of the important steps in image analysis as it helps analyze and understand complex images. At first, it has been tried to classify complex networks based on how it being used in image segmentation. In computer vision and image processing applications, image segmentation is essential for analyzing complex images with irregular shapes, textures, or overlapping boundaries. Advanced algorithms make use of machine learning, clustering, edge detection, and region-growing techniques. Graph theory principles combined with community detection-based methods allow for more precise analysis and interpretation of complex images. Hybrid approaches combine multiple techniques for comprehensive, robust segmentation, improving results in computer vision and image processing tasks.
本文介绍了使用复杂网络进行图像分割的方法。图像分割是图像分析的重要步骤,因为它有助于分析和理解复杂图像。首先,试图根据其在图像分割中的使用情况对复杂网络进行分类。在计算机视觉和图像处理应用中,图像分割对于分析复杂图像的形状不规则、纹理或重叠边界至关重要。高级算法利用机器学习、聚类、边缘检测和区域生长技术。社区检测方法与图论原理相结合,允许更精确地分析和解释复杂图像。混合方法结合多种技术进行全面的、稳健的分割,从而提高计算机视觉和图像处理任务的成果。
https://arxiv.org/abs/2401.02758