Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high computational complexity. Here, we propose a simple yet efficient stereo image SR model called NAFRSSR, which is modified from the previous state-of-the-art model NAFSSR by introducing recursive connections and lightweighting the constituent modules. Our NAFRSSR model is composed of nonlinear activation free and group convolution-based blocks (NAFGCBlocks) and depth-separated stereo cross attention modules (DSSCAMs). The NAFGCBlock improves feature extraction and reduces number of parameters by removing the simple channel attention mechanism from NAFBlock and using group convolution. The DSSCAM enhances feature fusion and reduces number of parameters by replacing 1x1 pointwise convolution in SCAM with weight-shared 3x3 depthwise convolution. Besides, we propose to incorporate trainable edge detection operator into NAFRSSR to further improve the model performance. Four variants of NAFRSSR with different sizes, namely, NAFRSSR-Mobile (NAFRSSR-M), NAFRSSR-Tiny (NAFRSSR-T), NAFRSSR-Super (NAFRSSR-S) and NAFRSSR-Base (NAFRSSR-B) are designed, and they all exhibit fewer parameters, higher PSNR/SSIM, and faster speed than the previous state-of-the-art models. In particular, to the best of our knowledge, NAFRSSR-M is the lightest (0.28M parameters) and fastest (50 ms inference time) model achieving an average PSNR/SSIM as high as 24.657 dB/0.7622 on the benchmark datasets. Codes and models will be released at this https URL.
立体图像超分辨率(SR)是指通常由双相机设备捕获的低分辨率(LR)图像的一对高分辨率(HR)图像的重建。为了提高SR图像的质量,以前的研究主要集中在增加特征图的数量和大小,并引入复杂且计算密集的结构,导致具有高计算复杂性的模型。在这里,我们提出了一种简单而有效的立体图像SR模型,称为NAFRSSR,它基于前 state-of-the-art模型NAFSSR,通过引入递归连接和轻量化构成模块。我们的NAFRSSR模型由非线性激活自由和基于组卷积的块(NAFGCBlocks)以及深度分离的立体跨注意模块(DSSCAMs)组成。NAFGCBlock通过从NAFBlock中移除简单的通道关注机制并使用组卷积来减少参数数量并提高特征提取。DSSCAM通过用权共享的3x3深度卷积来替换SCAM中的1x1点卷积,从而增强特征融合并减少参数数量。此外,我们还提出将可训练的边缘检测操作器集成到NAFRSSR中,以进一步提高模型性能。设计有四种不同大小的NAFRSSR变体,分别为:NAFRSSR-Mobile(NAFRSSR-M),NAFRSSR-Tiny(NAFRSSR-T),NAFRSSR-Super(NAFRSSR-S)和NAFRSSR-Base(NAFRSSR-B),它们都具有更少的参数、更高的PSNR/SSIM和更快的推理速度。特别是,据我们所知,NAFRSSR-M是最轻便(0.28M参数)且最快的(50ms推理时间)模型,在基准数据集上达到平均PSNR/SSIM 24.657 dB/0.7622。代码和模型发布在https://这个URL上。
https://arxiv.org/abs/2405.08423
Detecting anomaly edges for dynamic graphs aims to identify edges significantly deviating from the normal pattern and can be applied in various domains, such as cybersecurity, financial transactions and AIOps. With the evolving of time, the types of anomaly edges are emerging and the labeled anomaly samples are few for each type. Current methods are either designed to detect randomly inserted edges or require sufficient labeled data for model training, which harms their applicability for real-world applications. In this paper, we study this problem by cooperating with the rich knowledge encoded in large language models(LLMs) and propose a method, namely AnomalyLLM. To align the dynamic graph with LLMs, AnomalyLLM pre-trains a dynamic-aware encoder to generate the representations of edges and reprograms the edges using the prototypes of word embeddings. Along with the encoder, we design an in-context learning framework that integrates the information of a few labeled samples to achieve few-shot anomaly detection. Experiments on four datasets reveal that AnomalyLLM can not only significantly improve the performance of few-shot anomaly detection, but also achieve superior results on new anomalies without any update of model parameters.
检测动态图中的异常边旨在识别与正常模式显著不同的边,可以应用于各种领域,如网络安全、金融交易和AIOps。随着时间的推移,异常边的类型不断涌现,每个类型的有标签数据也越来越少。目前的方法要么是设计来检测随机插入的边,要么需要足够的有标签数据进行模型训练,这会损害其在现实应用中的适用性。在本文中,我们通过与拥有丰富知识的大型语言模型(LLMs)合作,研究了这个问题,并提出了名为 AnomalyLLM 的方法。为了将动态图与LLMs对齐,AnomalyLLM 通过预训练一个动态感知编码器来生成边缘的表示,并使用词向量示例对其进行重编程。与编码器一起,我们设计了一个上下文学习框架,将几个有标签样本的信息整合在一起,实现 few-shot 异常检测。在四个数据集上的实验表明,AnomalyLLM 不仅可以显著提高 few-shot 异常检测的性能,而且在新异常上还能获得更好的结果,而无需对模型参数进行更新。
https://arxiv.org/abs/2405.07626
High-definition map with accurate lane-level information is crucial for autonomous driving, but the creation of these maps is a resource-intensive process. To this end, we present a cost-effective solution to create lane-level roadmaps using only the global navigation satellite system (GNSS) and a camera on customer vehicles. Our proposed solution utilizes a prior standard-definition (SD) map, GNSS measurements, visual odometry, and lane marking edge detection points, to simultaneously estimate the vehicle's 6D pose, its position within a SD map, and also the 3D geometry of traffic lines. This is achieved using a Bayesian simultaneous localization and multi-object tracking filter, where the estimation of traffic lines is formulated as a multiple extended object tracking problem, solved using a trajectory Poisson multi-Bernoulli mixture (TPMBM) filter. In TPMBM filtering, traffic lines are modeled using B-spline trajectories, and each trajectory is parameterized by a sequence of control points. The proposed solution has been evaluated using experimental data collected by a test vehicle driving on highway. Preliminary results show that the traffic line estimates, overlaid on the satellite image, generally align with the lane markings up to some lateral offsets.
具有高清晰度地图和准确的路级信息对于自动驾驶至关重要,但创建这些地图是一个资源密集的过程。为此,我们提出了一个成本有效的解决方案,使用仅是全球导航卫星系统(GNSS)和车辆上的摄像机来创建道路级地图。我们的解决方案利用了预定义的标准定义(SD)地图、GNSS测量、视觉观测里程和车道标记边缘检测点,同时估计车辆的6D姿态,其在SD地图上的位置以及交通线的3D几何形状。这是通过使用贝叶斯同时定位和多对象跟踪滤波器实现的,其中交通线的估计作为一个多扩展对象跟踪问题,通过轨迹的概率Multi-Bernoulli混合(TPMBM)滤波器求解。在TPMBM滤波器中,交通线通过B-spline轨迹建模,并且每个轨迹由一系列控制点参数化。所提出的解决方案已通过在高速公路上行驶的测试车辆的实验数据进行了评估。初步结果表明,交通线估计,叠加在卫星图像上,通常与车道标记在横向偏移量上相符。
https://arxiv.org/abs/2405.04290
In the field of computer vision, the numerical encoding of 3D surfaces is crucial. It is classical to represent surfaces with their Signed Distance Functions (SDFs) or Unsigned Distance Functions (UDFs). For tasks like representation learning, surface classification, or surface reconstruction, this function can be learned by a neural network, called Neural Distance Function. This network, and in particular its weights, may serve as a parametric and implicit representation for the surface. The network must represent the surface as accurately as possible. In this paper, we propose a method for learning UDFs that improves the fidelity of the obtained Neural UDF to the original 3D surface. The key idea of our method is to concentrate the learning effort of the Neural UDF on surface edges. More precisely, we show that sampling more training points around surface edges allows better local accuracy of the trained Neural UDF, and thus improves the global expressiveness of the Neural UDF in terms of Hausdorff distance. To detect surface edges, we propose a new statistical method based on the calculation of a $p$-value at each point on the surface. Our method is shown to detect surface edges more accurately than a commonly used local geometric descriptor.
在计算机视觉领域,对3D表面的数值编码至关重要。通常,可以使用符号距离函数(SDF)或无符号距离函数(UDF)表示表面。对于诸如表示学习、表面分类或表面重建等任务,可以通过神经网络学习此函数,称为神经距离函数。这个网络及其权重可以作为表面参数和隐式表示。网络必须尽可能准确地表示表面。在本文中,我们提出了一种学习无符号距离函数(UDFs)的方法,从而提高了获得的神经UDF与原始3D表面的忠实程度。我们提出的方法的关键思想是将神经UDF的学习努力集中在其表面边缘上。具体来说,我们证明,在表面边缘附近采样更多的训练点可以提高训练后的神经UDF的局部准确性,从而改善神经UDF在哈密尔顿距离方面的表达性。为了检测表面边缘,我们提出了一种基于在表面每个点上计算$p$值的新统计方法。我们的方法被证明比常用的局部几何描述符得更准确地检测表面边缘。
https://arxiv.org/abs/2405.03381
The rapid advancement of automated artificial intelligence algorithms and remote sensing instruments has benefited change detection (CD) tasks. However, there is still a lot of space to study for precise detection, especially the edge integrity and internal holes phenomenon of change features. In order to solve these problems, we design the Change Guiding Network (CGNet), to tackle the insufficient expression problem of change features in the conventional U-Net structure adopted in previous methods, which causes inaccurate edge detection and internal holes. Change maps from deep features with rich semantic information are generated and used as prior information to guide multi-scale feature fusion, which can improve the expression ability of change features. Meanwhile, we propose a self-attention module named Change Guide Module (CGM), which can effectively capture the long-distance dependency among pixels and effectively overcome the problem of the insufficient receptive field of traditional convolutional neural networks. On four major CD datasets, we verify the usefulness and efficiency of the CGNet, and a large number of experiments and ablation studies demonstrate the effectiveness of CGNet. We're going to open-source our code at this https URL.
自动人工智能算法和遥感仪的快速发展为变化检测(CD)任务带来了好处。然而,对于精确检测,尤其是在变化特征的边缘完整性和内部孔现象方面,还有很多需要研究的问题。为了解决这些问题,我们设计了一个名为Change Guiding Network(CGNet)的工具,用于解决传统U-Net结构中变化特征表达不足的问题,导致不准确的边缘检测和内部孔。从具有丰富语义信息的深度特征中生成并使用作为先验信息的变体,引导多尺度特征融合,从而提高变化特征的表达能力。同时,我们提出了一个名为Change Guide Module(CGM)的自注意力模块,可以有效地捕捉像素之间的长距离依赖关系,有效克服传统卷积神经网络的接收域不足的问题。在四个主要CD数据集上,我们验证了CGNet的可用性和效率,并通过大量的实验和消融研究证明了CGNet的有效性。我们将开源我们的代码到这个链接:https://github.com/your_username/change_guiding_network。
https://arxiv.org/abs/2404.09179
We propose a novel method for geolocalizing Unmanned Aerial Vehicles (UAVs) in environments lacking Global Navigation Satellite Systems (GNSS). Current state-of-the-art techniques employ an offline-trained encoder to generate a vector representation (embedding) of the UAV's current view, which is then compared with pre-computed embeddings of geo-referenced images to determine the UAV's position. Here, we demonstrate that the performance of these methods can be significantly enhanced by preprocessing the images to extract their edges, which exhibit robustness to seasonal and illumination variations. Furthermore, we establish that utilizing edges enhances resilience to orientation and altitude inaccuracies. Additionally, we introduce a confidence criterion for localization. Our findings are substantiated through synthetic experiments.
我们提出了一个用于在缺乏全球导航卫星系统(GNSS)的环境中定位自主飞行器(UAV)的新方法。目前最先进的技术使用离线训练的编码器生成UAV当前视图的向量表示(嵌入),然后将其与预计算的地理参考图像的嵌入进行比较,以确定UAV的位置。在这里,我们证明了这些方法的性能可以通过预处理图像来提取其边缘得到显著提高,这些边缘表现出对季节性和光照变化具有较强的鲁棒性。此外,我们还证明了利用边缘可以增强对方向和高度不准确性的抵抗力。此外,我们引入了一个用于定位的置信度标准。我们的研究结果通过仿真实验得到了证实。
https://arxiv.org/abs/2404.06207
Edge detection is a long standing problem in computer vision. Recent deep learning based algorithms achieve state of-the-art performance in publicly available datasets. Despite the efficiency of these algorithms, their performance, however, relies heavily on the pretrained weights of the backbone network on the ImageNet dataset. This limits heavily the design space of deep learning based edge detectors. Whenever we want to devise a new model, we have to train this new model on the ImageNet dataset first, and then fine tune the model using the edge detection datasets. The comparison would be unfair otherwise. However, it is usually not feasible for many researchers to train a model on the ImageNet dataset due to the limited computation resources. In this work, we study the performance that can be achieved by state-of-the-art deep learning based edge detectors in publicly available datasets when they are trained from scratch, and devise a new network architecture, the multi-stream and multi scale fusion net (msmsfnet), for edge detection. We show in our experiments that by training all models from scratch to ensure the fairness of comparison, out model outperforms state-of-the art deep learning based edge detectors in three publicly available datasets.
边缘检测是计算机视觉领域的一个长期存在的问题。最近基于深度学习的算法在公开数据集上实现了最先进的性能。然而,尽管这些算法具有高效性,但它们的性能仍然很大程度上依赖于ImageNet数据集的预训练权重。这大大限制了基于深度学习的边缘检测模型的设计空间。每次我们想要设计一个新模型时,都必须在ImageNet数据集上训练这个新模型,然后通过边缘检测数据集进行微调。否则,比较结果将不公平。然而,通常许多研究人员由于计算资源有限,无法在ImageNet数据集上训练模型。在这项工作中,我们研究了在从零开始训练最先进的基于深度学习的边缘检测算法时,它们在公开数据集上的性能,并设计了一个新的网络架构,即多流多尺度融合网络(msmsfnet)用于边缘检测。我们在实验中展示了,通过将所有模型从零开始训练,确保比较的公平性,我们的模型在三个公开数据集上的性能优于基于ImageNet数据集的最先进的边缘检测算法。
https://arxiv.org/abs/2404.04856
This paper presents a novel monocular depth estimation method, named ECFNet, for estimating high-quality monocular depth with clear edges and valid overall structure from a single RGB image. We make a thorough inquiry about the key factor that affects the edge depth estimation of the MDE networks, and come to a ratiocination that the edge information itself plays a critical role in predicting depth details. Driven by this analysis, we propose to explicitly employ the image edges as input for ECFNet and fuse the initial depths from different sources to produce the final depth. Specifically, ECFNet first uses a hybrid edge detection strategy to get the edge map and edge-highlighted image from the input image, and then leverages a pre-trained MDE network to infer the initial depths of the aforementioned three images. After that, ECFNet utilizes a layered fusion module (LFM) to fuse the initial depth, which will be further updated by a depth consistency module (DCM) to form the final estimation. Extensive experimental results on public datasets and ablation studies indicate that our method achieves state-of-the-art performance. Project page: this https URL.
本文提出了一种名为ECFNet的新单目深度估计方法,用于从单个RGB图像中估计高质量单目深度,具有清晰的边缘和有效的整体结构。我们对MDE网络边缘深度估计的关键因素进行了深入调查,得出的结论是边缘信息本身在预测深度细节中扮演了关键角色。基于这一分析,我们提出将图像边缘作为输入,并融合不同来源的初始深度,以产生最终深度的方法。具体来说,ECFNet首先使用混合边缘检测策略从输入图像中获取边缘图和边缘突出图像,然后利用预训练的MDE网络推断上述三个图像的初始深度。此后,ECFNet采用层叠融合模块(LFM)将初始深度进行融合,该模块将根据深度一致性模块(DCM)进一步更新以形成最终估计。对公开数据集的广泛实验结果和消融研究结果表明,我们的方法达到了最先进的性能水平。项目页面:https:// this URL。
https://arxiv.org/abs/2404.00373
Mask-guided matting networks have achieved significant improvements and have shown great potential in practical applications in recent years. However, simply learning matting representation from synthetic and lack-of-real-world-diversity matting data, these approaches tend to overfit low-level details in wrong regions, lack generalization to objects with complex structures and real-world scenes such as shadows, as well as suffer from interference of background lines or textures. To address these challenges, in this paper, we propose a novel auxiliary learning framework for mask-guided matting models, incorporating three auxiliary tasks: semantic segmentation, edge detection, and background line detection besides matting, to learn different and effective representations from different types of data and annotations. Our framework and model introduce the following key aspects: (1) to learn real-world adaptive semantic representation for objects with diverse and complex structures under real-world scenes, we introduce extra semantic segmentation and edge detection tasks on more diverse real-world data with segmentation annotations; (2) to avoid overfitting on low-level details, we propose a module to utilize the inconsistency between learned segmentation and matting representations to regularize detail refinement; (3) we propose a novel background line detection task into our auxiliary learning framework, to suppress interference of background lines or textures. In addition, we propose a high-quality matting benchmark, Plant-Mat, to evaluate matting methods on complex structures. Extensively quantitative and qualitative results show that our approach outperforms state-of-the-art mask-guided methods.
近年来,随着计算机图形学和计算机视觉领域的快速发展,遮罩引导的 matting 网络已经取得了显著的改进,并在实际应用中展示了巨大的潜力。然而,这些方法往往从合成和缺乏真实世界多样性的 matting 数据中学习 matting 表示,并倾向于在错误的区域过度拟合低级细节,缺乏对具有复杂结构和真实世界场景的对象的泛化,同时还受到背景线或纹理的干扰。为了应对这些挑战,在本文中,我们提出了一个名为“合议引导的遮罩指导模型”的新辅助学习框架,结合三个辅助任务:语义分割、边缘检测和背景线检测,从不同类型的数据和注释中学习不同的有效表示。我们的框架和模型包括以下关键方面: 1. 在真实世界场景中,为具有多样化和复杂结构的物体学习真实世界自适应语义表示,我们在更丰富的真实世界数据上进行语义分割和边缘检测,并具有分割注释; 2. 为了避免在低级细节上过拟合,我们引入了一个模块,利用学习和 matting 表示之间的不一致性来对细节进行规范; 3. 我们在辅助学习框架中引入了一个新的背景线检测任务,以抑制背景线或纹理的干扰。 此外,我们还提出了一个名为“植物-mat”的高质量 mat 基准,用于评估 mat 方法在复杂结构上的效果。广泛的定量结果和定性结果表明,我们的方法超越了最先进的遮罩引导方法。
https://arxiv.org/abs/2403.19213
Abstract Art is an immensely popular, discussed form of art that often has the ability to depict the emotions of an artist. Many researchers have made attempts to study abstract art in the form of edge detection, brush stroke and emotion recognition algorithms using machine and deep learning. This papers describes the study of a wide distribution of abstract paintings using Generative Adversarial Neural Networks(GAN). GANs have the ability to learn and reproduce a distribution enabling researchers and scientists to effectively explore and study the generated image space. However, the challenge lies in developing an efficient GAN architecture that overcomes common training pitfalls. This paper addresses this challenge by introducing a modified-DCGAN (mDCGAN) specifically designed for high-quality artwork generation. The approach involves a thorough exploration of the modifications made, delving into the intricate workings of DCGANs, optimisation techniques, and regularisation methods aimed at improving stability and realism in art generation enabling effective study of generated patterns. The proposed mDCGAN incorporates meticulous adjustments in layer configurations and architectural choices, offering tailored solutions to the unique demands of art generation while effectively combating issues like mode collapse and gradient vanishing. Further this paper explores the generated latent space by performing random walks to understand vector relationships between brush strokes and colours in the abstract art space and a statistical analysis of unstable outputs after a certain period of GAN training and compare its significant difference. These findings validate the effectiveness of the proposed approach, emphasising its potential to revolutionise the field of digital art generation and digital art ecosystem.
抽象艺术是一种非常受欢迎且备受讨论的绘画形式,通常具有描绘艺术家情感的能力。许多研究者使用机器和深度学习尝试研究抽象艺术,以边缘检测、笔触和情感识别算法的形式。本文描述了使用生成对抗网络(GAN)对广泛分布的抽象绘画进行研究。GAN具有学习和复制分布的能力,使研究人员和科学家能够有效探索和研究生成的图像空间。然而,挑战在于开发一个高效的GAN架构,克服常见的训练陷阱。本文通过引入一种专门为高质量艺术品生成而设计的modified-DCGAN(mDCGAN)来解决这一挑战。该方法深入探讨了所做修改,深入研究了DCGAN的复杂工作原理、优化技术和正则化方法,旨在提高艺术生成过程中的稳定性和真实性,从而有效研究生成模式。所提出的mDCGAN在层配置和架构选择方面进行了细致的调整,为艺术生成提供了针对性的解决方案,同时有效地解决了诸如模式收缩和梯度消失等问题。此外,本文通过进行随机的漫步来探索抽象艺术空间中笔触和颜色之间的向量关系,并对训练过程中一定期限内的不稳定输出进行统计分析,比较其与显著差别的差异。这些发现证实了所提出方法的有效性,强调了其对数字艺术生成领域和数字艺术生态系统的变革潜力。
https://arxiv.org/abs/2403.18397
Automatic measurements of tear meniscus height (TMH) have been achieved by using deep learning techniques; however, annotation is significantly influenced by subjective factors and is both time-consuming and labor-intensive. In this paper, we introduce an automatic TMH measurement technique based on edge detection-assisted annotation within a deep learning framework. This method generates mask labels less affected by subjective factors with enhanced efficiency compared to previous annotation approaches. For improved segmentation of the pupil and tear meniscus areas, the convolutional neural network Inceptionv3 was first implemented as an image quality assessment model, effectively identifying higher-quality images with an accuracy of 98.224%. Subsequently, by using the generated labels, various algorithms, including Unet, ResUnet, Deeplabv3+FcnResnet101, Deeplabv3+FcnResnet50, FcnResnet50, and FcnResnet101 were trained, with Unet demonstrating the best performance. Finally, Unet was used for automatic pupil and tear meniscus segmentation to locate the center of the pupil and calculate TMH,respectively. An evaluation of the mask quality predicted by Unet indicated a Mean Intersection over Union of 0.9362, a recall of 0.9261, a precision of 0.9423, and an F1-Score of 0.9326. Additionally, the TMH predicted by the model was assessed, with the fitting curve represented as y= 0.982x-0.862, an overall correlation coefficient of r^2=0.961 , and an accuracy of 94.80% (237/250). In summary, the algorithm can automatically screen images based on their quality,segment the pupil and tear meniscus areas, and automatically measure TMH. Measurement results using the AI algorithm demonstrate a high level of consistency with manual measurements, offering significant support to clinical doctors in diagnosing dry eye disease.
通过使用深度学习技术实现对 tears meniscus height (TMH) 的自动测量;然而,注释受到主观因素的影响,并且耗时且劳动密集。在本文中,我们介绍了一种基于深度学习框架的自动 TMH 测量技术。与之前注释方法相比,该方法生成的掩码标签受到主观因素的影响较小,效率更高。为了提高对瞳孔和泪液 meniscus 区域的分割,我们首先将卷积神经网络 Inceptionv3 实现为图像质量评估模型,有效识别出准确率高达 98.224% 的更高质量图像。接着,通过生成的标签,对各种算法进行训练,包括 Unet、ResUnet、Deeplabv3+FcnResnet101、Deeplabv3+FcnResnet50、FcnResnet50 和 FcnResnet101。其中,Unet 的表现最佳。最后,我们使用 Unet 对自动瞳孔和泪液 meniscus 区域进行分割,分别计算 TMH。使用 Unet 的预测掩码质量评估表明平均交集 over Union 为 0.9362,召回率为 0.9261,精确率为 0.9423,F1- 分数为 0.9326。此外,对模型的 TMH 预测进行评估,拟合曲线表示为 y = 0.982x - 0.862,相关系数 r^2 = 0.961,准确率高达 94.80%(237/250)。总之,该算法可以根据图像质量自动筛选图像,分割瞳孔和泪液 meniscus 区域,并自动测量 TMH。使用 AI 算法进行的测量结果与手动测量结果一致,为临床医生在诊断干眼病方面提供了有力的支持。
https://arxiv.org/abs/2403.15853
The maintenance, archiving and usage of the design drawings is cumbersome in physical form in different industries for longer period. It is hard to extract information by simple scanning of drawing sheets. Converting them to their digital formats such as Computer-Aided Design (CAD), with needed knowledge extraction can solve this problem. The conversion of these machine drawings to its digital form is a crucial challenge which requires advanced techniques. This research proposes an innovative methodology utilizing Deep Learning methods. The approach employs object detection model, such as Yolov7, Faster R-CNN, to detect physical drawing objects present in the images followed by, edge detection algorithms such as canny filter to extract and refine the identified lines from the drawing region and curve detection techniques to detect circle. Also ornaments (complex shapes) within the drawings are extracted. To ensure comprehensive conversion, an Optical Character Recognition (OCR) tool is integrated to identify and extract the text elements from the drawings. The extracted data which includes the lines, shapes and text is consolidated and stored in a structured comma separated values(.csv) file format. The accuracy and the efficiency of conversion is evaluated. Through this, conversion can be automated to help organizations enhance their productivity, facilitate seamless collaborations and preserve valuable design information in a digital format easily accessible. Overall, this study contributes to the advancement of CAD conversions, providing accurate results from the translating process. Future research can focus on handling diverse drawing types, enhanced accuracy in shape and line detection and extraction.
设计图的维护、归档和使用在不同的行业中存在形式上的复杂性,尤其是在较长时间内。通过简单的扫描图纸来提取信息非常困难。将它们转换为数字格式,如计算机辅助设计(CAD),如果需要知识提取,可以解决这个问题。将这些机器图纸转换为数字形式是一个关键的挑战,需要先进的技术。这项研究提出了利用深度学习方法的创新方法。该方法采用物体检测模型(如Yolov7、Faster R-CNN)来检测图像中的物理绘图物体,然后使用边缘检测算法(如canny滤波器)提取和优化已识别的线条和曲线检测技术(如圆检测)来检测圆。此外,还提取了图纸中的装饰物(复杂形状)。为了确保全面的转换,还集成了光学字符识别(OCR)工具,用于从设计图中识别和提取文本元素。提取的数据包括线条、形状和文本,以结构化的逗号分隔值(.csv)文件格式进行汇总。评估了转换的准确性和效率。通过这项研究,转换可以自动化,帮助组织提高生产力,促进无缝合作,并轻松地保存有价值的数字设计信息。总的来说,这项研究为CAD转换的发展做出了贡献,从翻译过程中提供了准确的结果。未来的研究可以关注处理不同类型的绘图,形状和线检测的准确度,以及提取。
https://arxiv.org/abs/2403.11291
We propose Texture Edge detection using Patch consensus (TEP) which is a training-free method to detect the boundary of texture. We propose a new simple way to identify the texture edge location, using the consensus of segmented local patch information. While on the boundary, even using local patch information, the distinction between textures are typically not clear, but using neighbor consensus give a clear idea of the boundary. We utilize local patch, and its response against neighboring regions, to emphasize the similarities and the differences across different textures. The step of segmentation of response further emphasizes the edge location, and the neighborhood voting gives consensus and stabilize the edge detection. We analyze texture as a stationary process to give insight into the patch width parameter verses the quality of edge detection. We derive the necessary condition for textures to be distinguished, and analyze the patch width with respect to the scale of textures. Various experiments are presented to validate the proposed model.
我们提出了一种名为Texture Edge检测的Patch共识方法(TEP),这是一种无需训练的检测纹理边界的训练-免费方法。我们提出了一种新的简单方法来确定纹理边缘位置,利用分割局部补丁信息的共识。即使在边界上,即使使用局部补丁信息,纹理之间的区别通常也不是很清晰,但是使用邻居共识可以明确地得到边界。我们利用局部补丁及其对邻近区域的响应来强调不同纹理之间的相似之处和差异。对于响应的分割步骤进一步强调了边缘位置,邻居投票可以达成共识并稳定边缘检测。我们将纹理视为静止过程,以探究纹理宽度参数与边缘检测质量之间的关系。我们推导了纹理被区分的必要条件,并分析了纹理宽度与纹理规模的关系。我们展示了各种实验来验证所提出的模型。
https://arxiv.org/abs/2403.11038
Federated learning (FL) empowers privacy-preservation in model training by only exposing users' model gradients. Yet, FL users are susceptible to the gradient inversion (GI) attack which can reconstruct ground-truth training data such as images based on model gradients. However, reconstructing high-resolution images by existing GI attack works faces two challenges: inferior accuracy and slow-convergence, especially when the context is complicated, e.g., the training batch size is much greater than 1 on each FL user. To address these challenges, we present a Robust, Accurate and Fast-convergent GI attack algorithm, called RAF-GI, with two components: 1) Additional Convolution Block (ACB) which can restore labels with up to 20% improvement compared with existing works; 2) Total variance, three-channel mEan and cAnny edge detection regularization term (TEA), which is a white-box attack strategy to reconstruct images based on labels inferred by ACB. Moreover, RAF-GI is robust that can still accurately reconstruct ground-truth data when the users' training batch size is no more than 48. Our experimental results manifest that RAF-GI can diminish 94% time costs while achieving superb inversion quality in ImageNet dataset. Notably, with a batch size of 1, RAF-GI exhibits a 7.89 higher Peak Signal-to-Noise Ratio (PSNR) compared to the state-of-the-art baselines.
联邦学习(FL)通过仅暴露用户的模型梯度来实现模型的隐私保护。然而,FL用户易受到梯度反向(GI)攻击的攻击,该攻击可以根据模型梯度重构训练数据,如图像。然而,通过现有的GI攻击重构高分辨率图像面临着两个挑战:准确性和收敛速度,尤其是在复杂背景下,例如每个FL用户的训练批量大小远大于1。为了应对这些挑战,我们提出了一个鲁棒、准确且收敛速度快的GI攻击算法,称为RAF-GI,包含两个组件:1)附加卷积层(ACB),它可以比现有工作最多提高20%的标签恢复;2)总方差,三个通道的mEan和cCanny边缘检测正则化项(TEA),这是一种白盒攻击策略,用于根据ACB推断的标签重构图像。此外,RAF-GI具有鲁棒性,即使在用户训练批量大小不超过48时,仍能准确地重构地面真实数据。我们的实验结果表明,RAF-GI可以在ImageNet数据集上减少94%的时间开销,同时具有出色的逆向质量。值得注意的是,当批量为1时,RAF-GI显示出比最先进的基准模型高出7.89倍的峰值信号-噪声比(PSNR)。
https://arxiv.org/abs/2403.08383
As a new distributed computing framework that can protect data privacy, federated learning (FL) has attracted more and more attention in recent years. It receives gradients from users to train the global model and releases the trained global model to working users. Nonetheless, the gradient inversion (GI) attack reflects the risk of privacy leakage in federated learning. Attackers only need to use gradients through hundreds of thousands of simple iterations to obtain relatively accurate private data stored on users' local devices. For this, some works propose simple but effective strategies to obtain user data under a single-label dataset. However, these strategies induce a satisfactory visual effect of the inversion image at the expense of higher time costs. Due to the semantic limitation of a single label, the image obtained by gradient inversion may have semantic errors. We present a novel gradient inversion strategy based on canny edge detection (MGIC) in both the multi-label and single-label datasets. To reduce semantic errors caused by a single label, we add new convolution layers' blocks in the trained model to obtain the image's multi-label. Through multi-label representation, serious semantic errors in inversion images are reduced. Then, we analyze the impact of parameters on the difficulty of input image reconstruction and discuss how image multi-subjects affect the inversion performance. Our proposed strategy has better visual inversion image results than the most widely used ones, saving more than 78% of time costs in the ImageNet dataset.
作为一个新兴的分布式计算框架,保护数据隐私的联邦学习(FL)近年来吸引了越来越多的关注。它从用户那里接收梯度以训练全局模型,然后将训练好的全局模型发布给工作用户。然而,梯度反向(GI)攻击反映了在联邦学习中隐私泄露的风险。攻击者只需通过成千上万个简单的迭代使用梯度来获取存储在用户本地设备上的相对准确的用户数据。为此,一些工作提出了简单的但有效的策略来在单标签数据集中获取用户数据。然而,这些策略在提高图像反向效果的同时,导致了更高的时间开销。由于单标签数据的语义限制,获得的图像可能存在语义错误。我们提出了一个基于Canny边缘检测(MGIC)的多标签和单标签数据集的新颖梯度反向策略。为了减少由单标签引起的语义错误,我们在训练模型中添加了新的卷积层片段以获取图像的多个标签。通过多标签表示,降低了反向图像中的严重语义错误。然后,我们分析了参数对输入图像重构难度的影响,并讨论了图像多学科户如何影响反向性能。我们提出的方法在ImageNet数据集上的图像反向图像结果优于最广泛使用的方法,节省了超过78%的时间开销。
https://arxiv.org/abs/2403.08284
Detecting edges in images suffers from the problems of (P1) heavy imbalance between positive and negative classes as well as (P2) label uncertainty owing to disagreement between different annotators. Existing solutions address P1 using class-balanced cross-entropy loss and dice loss and P2 by only predicting edges agreed upon by most annotators. In this paper, we propose RankED, a unified ranking-based approach that addresses both the imbalance problem (P1) and the uncertainty problem (P2). RankED tackles these two problems with two components: One component which ranks positive pixels over negative pixels, and the second which promotes high confidence edge pixels to have more label certainty. We show that RankED outperforms previous studies and sets a new state-of-the-art on NYUD-v2, BSDS500 and Multi-cue datasets. Code is available at this https URL.
在图像中检测边缘存在以下问题:(P1)正负类别之间的严重不平衡;(P2)由于不同注释者之间的分歧,导致标签不确定性。现有的解决方案通过类别平衡交叉熵损失和Dice损失来解决(P1)问题,而通过仅预测由大多数注释者达成一致的边缘来解决(P2)问题。在本文中,我们提出 RankED,一种基于统一排名的方法,解决了(P1)和(P2)问题。RankED 使用两个组件来解决这些问题:一个组件对负类像素进行排名,另一个组件通过促进高置信度边缘像素具有更多标签确定性来解决(P2)问题。我们证明了 RankED 超越了以前的研究,并在 NYUD-v2、BSDS500 和 Multi-cue 数据集上取得了最先进的水平。代码可在此处下载:https://url.com/
https://arxiv.org/abs/2403.01795
Accurate segmentation of COVID-19 CT images is crucial for reducing the severity and mortality rates associated with COVID-19 infections. In response to blurred boundaries and high variability characteristic of lesion areas in COVID-19 CT images, we introduce CDSE-UNet: a novel UNet-based segmentation model that integrates Canny operator edge detection and a dual-path SENet feature fusion mechanism. This model enhances the standard UNet architecture by employing the Canny operator for edge detection in sample images, paralleling this with a similar network structure for semantic feature extraction. A key innovation is the Double SENet Feature Fusion Block, applied across corresponding network layers to effectively combine features from both image paths. Moreover, we have developed a Multiscale Convolution approach, replacing the standard Convolution in UNet, to adapt to the varied lesion sizes and shapes. This addition not only aids in accurately classifying lesion edge pixels but also significantly improves channel differentiation and expands the capacity of the model. Our evaluations on public datasets demonstrate CDSE-UNet's superior performance over other leading models, particularly in segmenting large and small lesion areas, accurately delineating lesion edges, and effectively suppressing noise
准确分割 COVID-19 CT 图像对降低由 COVID-19 感染引起的严重程度和死亡率至关重要。为应对 COVID-19 CT 图像中模糊的边界和高变异性特征区域,我们引入了 CDSE-UNet:一种新颖的 UNet 分割模型,它整合了 Canny 操作符边缘检测和双路径 SENet 特征融合机制。通过在样本图像中使用 Canny 操作符进行边缘检测,并与同构网络结构进行 semantic 特征提取,从而增强 standard UNet 架构。关键创新是应用 Double SENet Feature Fusion Block,跨相应网络层有效地结合图像路径的特征。此外,我们还开发了多尺度卷积方法,用 UNet 的标准卷积替换,以适应各种病变尺寸和形状。这一改进不仅有助于准确分类病变边缘像素,而且显著提高了通道差异,扩展了模型的容量。我们在公共数据集上的评估表明,CDSE-UNet 的性能超过其他领先模型,尤其是在分割大和小型病变区域、准确描绘病变边缘和有效抑制噪声方面。
https://arxiv.org/abs/2403.01513
Edge detection as a pre-processing stage is a fundamental and important aspect of the number plate extraction system. This is due to the fact that the identification of a particular vehicle is achievable using the number plate because each number plate is unique to a vehicle. As such, the characters of a number plate system that differ in lines and shapes can be extracted using the principle of edge detection. This paper presents a method of number plate extraction using edge detection technique. Edges in number plates are identified with changes in the intensity of pixel values. Therefore, these edges are identified using a single based pixel or collection of pixel-based approach. The efficiency of these approaches of edge detection algorithms in number plate extraction in both noisy and clean environment are experimented. Experimental results are achieved in MATLAB 2017b using the Pratt Figure of Merit (PFOM) as a performance metric
将边缘检测作为预处理阶段是数字路牌提取系统的关键和重要方面。这是因为通过数字路牌可以实现对特定车辆的识别,因为每个数字路牌都是独一无二的。因此,使用边缘检测原理可以提取不同形状和线性的数字路牌系统中的字符。本文介绍了一种使用边缘检测技术进行数字路牌提取的方法。通过改变像素值的强度来识别数字路牌中的边缘。因此,这些边缘使用基于像素的单个或像素集合方法进行识别。在数字路牌提取的嘈杂和干净环境中,这些边缘检测算法的效率进行了实验验证。实验结果是在MATLAB 2017b中使用普特图性能度量(PFOM)取得的。
https://arxiv.org/abs/2402.18251
This work focuses on using advanced techniques for structural health monitoring (SHM) for bridges with Traffic. We propose an approach using deep reinforcement learning (DRL)-based control for Unmanned Aerial Vehicle (UAV). Our approach conducts a concrete bridge deck survey while traffic is ongoing and detects cracks. The UAV performs the crack detection, and the location of cracks is initially unknown. We use two edge detection techniques. First, we use canny edge detection for crack detection. We also use a Convolutional Neural Network (CNN) for crack detection and compare it with canny edge detection. Transfer learning is applied using CNN with pre-trained weights obtained from a crack image dataset. This enables the model to adapt and improve its performance in identifying and localizing cracks. Proximal Policy Optimization (PPO) is applied for UAV control and bridge surveys. The experimentation across various scenarios is performed to evaluate the performance of the proposed methodology. Key metrics such as task completion time and reward convergence are observed to gauge the effectiveness of the approach. We observe that the Canny edge detector offers up to 40\% lower task completion time, while the CNN excels in up to 12\% better damage detection and 1.8 times better rewards.
此工作重点关注使用先进的结构健康监测(SHM)技术对交通量较高的桥梁进行监测。我们提出了使用深度强化学习(DRL)进行无人机(UAV)控制的方案。我们的方法在交通进行时进行混凝土桥面调查并检测裂缝。无人机进行裂缝检测,裂缝的位置最初是不知道的。我们使用了两种边缘检测技术。首先,我们使用canny边缘检测进行裂缝检测。此外,我们还使用卷积神经网络(CNN)进行裂缝检测,并将其与canny边缘检测进行比较。通过应用预训练的权重,使用CNN进行迁移学习,使得模型能够适应并提高其在识别和定位裂缝方面的性能。 对于无人机控制和桥面调查,我们应用了局部策略优化(PPO)。通过在各种场景中进行实验,以评估所提出的方法的有效性。我们观察到,canny边缘检测的平均任务完成时间可降低40%,而CNN在提高至12%的损伤检测和1.8倍奖励方面表现出色。
https://arxiv.org/abs/2402.14757
Light plays a vital role in vision either human or machine vision, the perceived color is always based on the lighting conditions of the surroundings. Researchers are working to enhance the color detection techniques for the application of computer vision. They have implemented proposed several methods using different color detection approaches but still, there is a gap that can be filled. To address this issue, a color detection method, which is based on a Convolutional Neural Network (CNN), is proposed. Firstly, image segmentation is performed using the edge detection segmentation technique to specify the object and then the segmented object is fed to the Convolutional Neural Network trained to detect the color of an object in different lighting conditions. It is experimentally verified that our method can substantially enhance the robustness of color detection in different lighting conditions, and our method performed better results than existing methods.
光在无论是人还是机器视觉中扮演着至关重要的角色,感知到的颜色始终基于周围环境的照明条件。研究人员正在努力增强计算机视觉应用中的颜色检测技术。他们提出了几种使用不同颜色检测方法实现的方法,但仍存在一个可以填补的空白。为了解决这个问题,我们提出了一个基于卷积神经网络(CNN)的颜色检测方法。首先,使用边缘检测分割技术进行图像分割,以便指定物体,然后将分割的物体输入到训练以检测不同光照条件下物体颜色的卷积神经网络中。经过实验验证,我们的方法可以在不同光照条件下显著增强颜色检测的鲁棒性,并且我们的方法的表现优于现有方法。
https://arxiv.org/abs/2402.04762