The human brain exhibits a strong ability to spontaneously associate different visual attributes of the same or similar visual scene, such as associating sketches and graffiti with real-world visual objects, usually without supervising information. In contrast, in the field of artificial intelligence, controllable generation methods like ControlNet heavily rely on annotated training datasets such as depth maps, semantic segmentation maps, and poses, which limits the method's scalability. Inspired by the neural mechanisms that may contribute to the brain's associative power, specifically the cortical modularization and hippocampal pattern completion, here we propose a self-supervised controllable generation (SCG) framework. Firstly, we introduce an equivariant constraint to promote inter-module independence and intra-module correlation in a modular autoencoder network, thereby achieving functional specialization. Subsequently, based on these specialized modules, we employ a self-supervised pattern completion approach for controllable generation training. Experimental results demonstrate that the proposed modular autoencoder effectively achieves functional specialization, including the modular processing of color, brightness, and edge detection, and exhibits brain-like features including orientation selectivity, color antagonism, and center-surround receptive fields. Through self-supervised training, associative generation capabilities spontaneously emerge in SCG, demonstrating excellent generalization ability to various tasks such as associative generation on painting, sketches, and ancient graffiti. Compared to the previous representative method ControlNet, our proposed approach not only demonstrates superior robustness in more challenging high-noise scenarios but also possesses more promising scalability potential due to its self-supervised manner.
人类大脑表现出将同一或相似视觉场景的多个视觉属性自发地关联起来的强能力,例如将草图和涂鸦与现实世界的视觉对象相关联,通常不需要监督信息。相比之下,在人工智能领域,像ControlNet这样的可控制生成方法 heavily依赖已标注的训练数据,如深度图、语义分割图和姿态,这限制了方法的可扩展性。受到大脑可能有助于提高关联能力的神经机制(特别是模块化处理和海马回填充)的启发,我们提出了一个自监督可控制生成(SCG)框架。首先,我们在模块化自编码器网络中引入了等价约束以促进模块间的独立性和模块内的相关性,从而实现功能专门化。接着,基于这些专用模块,我们采用自监督模式完成方法进行可控制生成训练。实验结果表明,与所提出的自监督自编码器相比,所提出的模块化自编码器有效地实现了功能专门化,包括对颜色、亮度和边缘检测的模块化处理,并表现出与大脑类似的特征,如方向选择性、颜色对抗和中心周围感受野。通过自监督训练,SCG中的关联生成能力会自发地出现,这表明我们的方法在各种任务上的泛化能力都具有卓越的特性,如在绘画、草图和古代涂鸦上的关联生成。与之前的代表方法ControlNet相比,我们所提出的方法不仅展示了在更具有挑战性的高噪声场景中更优越的鲁棒性,而且由于其自监督方式,还具有更具有前景的可扩展性潜力。
https://arxiv.org/abs/2409.18694
Single-photon cameras (SPCs) are emerging as sensors of choice for various challenging imaging applications. One class of SPCs based on the single-photon avalanche diode (SPAD) detects individual photons using an avalanche process; the raw photon data can then be processed to extract scene information under extremely low light, high dynamic range, and rapid motion. Yet, single-photon sensitivity in SPADs comes at a cost -- each photon detection consumes more energy than that of a CMOS camera. This avalanche power significantly limits sensor resolution and could restrict widespread adoption of SPAD-based SPCs. We propose a computational-imaging approach called \emph{photon inhibition} to address this challenge. Photon inhibition strategically allocates detections in space and time based on downstream inference task goals and resource constraints. We develop lightweight, on-sensor computational inhibition policies that use past photon data to disable SPAD pixels in real-time, to select the most informative future photons. As case studies, we design policies tailored for image reconstruction and edge detection, and demonstrate, both via simulations and real SPC captured data, considerable reduction in photon detections (over 90\% of photons) while maintaining task performance metrics. Our work raises the question of ``which photons should be detected?'', and paves the way for future energy-efficient single-photon imaging.
单光子相机(SPCs)正成为各种具有挑战性的成像应用的传感器。基于单光子增益二极管(SPAD)的一类SPCs通过增殖过程检测单个光子;然后对原始光子数据进行处理,以在极度低光、高动态范围和快速运动的情况下提取场景信息。然而,SPAD中的单光子敏感度代价高昂——每个光子检测消耗的能源比CMOS相机多得多。这种增殖能力显著限制了传感器分辨率,可能会限制基于SPAD的SPCs的广泛采用。我们提出了一个计算影像方法,称为\emph{光子抑制},来应对这一挑战。光子抑制策略根据下游推理任务的 goals和资源限制,将检测结果在空间和时间上进行分配。我们开发了轻量级、实时计算抑制策略,利用过去的光子数据来禁用SPAD像素,选择最有信息量的未来光子。通过案例研究,我们设计了专为图像重建和边缘检测设计的策略,并表明,通过模拟和实际的SPC捕获数据,光子检测数量(超过90\%)在保持任务性能指标的同时显著减少。我们的工作引发了“应该检测哪些光子?”的问题,并为未来节能单光子成像铺平了道路。
https://arxiv.org/abs/2409.18337
Crack detection, particularly from pavement images, presents a formidable challenge in the domain of computer vision due to several inherent complexities such as intensity inhomogeneity, intricate topologies, low contrast, and noisy backgrounds. Automated crack detection is crucial for maintaining the structural integrity of essential infrastructures, including buildings, pavements, and bridges. Existing lightweight methods often face challenges including computational inefficiency, complex crack patterns, and difficult backgrounds, leading to inaccurate detection and impracticality for real-world applications. To address these limitations, we propose EfficientCrackNet, a lightweight hybrid model combining Convolutional Neural Networks (CNNs) and transformers for precise crack segmentation. EfficientCrackNet integrates depthwise separable convolutions (DSC) layers and MobileViT block to capture both global and local features. The model employs an Edge Extraction Method (EEM) and for efficient crack edge detection without pretraining, and Ultra-Lightweight Subspace Attention Module (ULSAM) to enhance feature extraction. Extensive experiments on three benchmark datasets Crack500, DeepCrack, and GAPs384 demonstrate that EfficientCrackNet achieves superior performance compared to existing lightweight models, while requiring only 0.26M parameters, and 0.483 FLOPs (G). The proposed model offers an optimal balance between accuracy and computational efficiency, outperforming state-of-the-art lightweight models, and providing a robust and adaptable solution for real-world crack segmentation.
路面图像的裂缝检测特别具有挑战性,在计算机视觉领域 due to 几个固有的复杂性,如强度非均匀性,复杂拓扑结构,低对比度和噪声背景。自动裂缝检测对于维护基础设施的结构完整性至关重要,包括建筑、道路和桥梁。现有的轻量级方法通常面临计算效率低下,复杂裂缝模式和困难背景等问题,导致不准确检测和实际应用中的不适。为了克服这些限制,我们提出了 EfficientCrackNet,一种结合卷积神经网络(CNN)和Transformer的轻量级混合模型,用于精确裂缝分割。EfficientCrackNet 集成了深度可分离卷积(DSC)层和 MobileViT 模块来捕捉全局和局部特征。模型采用边缘提取方法(EEM)和用于高效裂缝边缘检测的不预训练的 Ultra-Lightweight Subspace Attention Module(ULSAM)来增强特征提取。在 Crack500、DeepCrack 和 GAPs384 等三个基准数据集上进行的广泛实验证明,与现有轻量级模型相比,EfficientCrackNet 具有卓越的性能,而只需要 0.26M 个参数和 0.483 FLOPs(G)。所提出的模型在准确性和计算效率之间达到了最优平衡,超越了最先进的轻量级模型,并为实际裂缝分割提供了一个健壮和可扩展的解决方案。
https://arxiv.org/abs/2409.18099
The performance of deep learning based edge detector has far exceeded that of humans, but the huge computational cost and complex training strategy hinder its further development and application. In this paper, we eliminate these complexities with a vanilla encoder-decoder based detector. Firstly, we design a bilateral encoder to decouple the extraction process of location features and semantic features. Since the location branch no longer provides cues for the semantic branch, the richness of features can be further compressed, which is the key to make our model more compact. We propose a cascaded feature fusion decoder, where the location features are progressively refined by semantic features. The refined location features are the only basis for generating the edge map. The coarse original location features and semantic features are avoided from direct contact with the final result. So the noise in the location features and the location error in the semantic features can be suppressed in the generated edge map. The proposed New Baseline for Edge Detection (NBED) achieves superior performance consistently across multiple edge detection benchmarks, even compared with those methods with huge computational cost and complex training strategy. The ODS of NBED on BSDS500 is 0.838, achieving state-of-the-art performance. Our study shows that what really matters in the current edge detection is high-quality features, and we can make the encoder-decoder based detector great again even without complex training strategies and huge computational cost. The code is available at this https URL.
基于深度学习的边缘检测的表现已经远远超过了人类水平,但巨大的计算成本和复杂的训练策略限制了其进一步发展和应用。在本文中,我们通过一个基于原编码器-解码器的简单编码器-解码器来消除这些复杂性。首先,我们设计了一个双边编码器,将位置特征和语义特征的提取过程解耦。由于位置分支不再提供语义分支的提示,特征可以进一步压缩,这是使我们的模型更紧凑的关键。我们提出了一个级联特征融合解码器,其中语义特征通过位置特征进行逐步优化。细化的位置特征是生成边缘图的唯一基础。为了避免与最终结果的直接接触,粗略的原位置特征和语义特征被避免。因此,在生成的边缘图中,位置特征和语义特征的噪声可以被抑制。所提出的基于边缘检测的新基线(NBED)在多个边缘检测基准测试中始终具有卓越的性能,即使与具有巨大计算成本和复杂训练策略的方法相比,也是如此。NBED在BSDS500上的ODS为0.838,实现了最先进的性能。我们的研究表明,在当前的边缘检测中,真正重要的是高质量的特性,即使没有复杂的训练策略和巨大的计算成本,我们也可以使基于编码器-解码器的检测器再次伟大。代码可在此处访问:https:// this URL.
https://arxiv.org/abs/2409.14976
In this work, we introduce OmniGen, a new diffusion model for unified image generation. Unlike popular diffusion models (e.g., Stable Diffusion), OmniGen no longer requires additional modules such as ControlNet or IP-Adapter to process diverse control conditions. OmniGenis characterized by the following features: 1) Unification: OmniGen not only demonstrates text-to-image generation capabilities but also inherently supports other downstream tasks, such as image editing, subject-driven generation, and visual-conditional generation. Additionally, OmniGen can handle classical computer vision tasks by transforming them into image generation tasks, such as edge detection and human pose recognition. 2) Simplicity: The architecture of OmniGen is highly simplified, eliminating the need for additional text encoders. Moreover, it is more user-friendly compared to existing diffusion models, enabling complex tasks to be accomplished through instructions without the need for extra preprocessing steps (e.g., human pose estimation), thereby significantly simplifying the workflow of image generation. 3) Knowledge Transfer: Through learning in a unified format, OmniGen effectively transfers knowledge across different tasks, manages unseen tasks and domains, and exhibits novel capabilities. We also explore the model's reasoning capabilities and potential applications of chain-of-thought mechanism. This work represents the first attempt at a general-purpose image generation model, and there remain several unresolved issues. We will open-source the related resources at this https URL to foster advancements in this field.
在这项工作中,我们引入了OmniGen,一种用于统一图像生成的全新扩散模型。与流行的扩散模型(如Stable Diffusion)不同,OmniGen不再需要额外的模块(如ControlNet或IP-Adapter)来处理多样化的控制条件。OmniGen的特点如下: 1)统一:OmniGen不仅展示了文本到图像生成能力,还天生支持其他下游任务,例如图像编辑、主题驱动生成和视觉条件生成。此外,OmniGen可以通过将经典计算机视觉任务转换为图像生成任务来处理它们,例如边缘检测和人体姿态识别。 2)简单性:OmniGen的架构高度简化,无需额外文本编码器。与现有扩散模型相比,OmniGen更加用户友好,使通过指令完成复杂任务而不需要额外预处理步骤(例如人体姿态估计),从而大大简化了图像生成工作流程。 3)知识传递:通过统一的格式学习,OmniGen有效地在不同任务和领域之间传递知识,管理未知任务和领域,并展示出新颖的特性。我们还研究了模型的推理能力和连锁思维机制的应用。这项工作代表了通用图像生成模型的首次尝试,但仍存在许多未解决的问题。我们将在此链接上公开相关资源,以推动该领域的发展。
https://arxiv.org/abs/2409.11340
Edge detection, as a fundamental task in computer vision, has garnered increasing attention. The advent of deep learning has significantly advanced this field. However, recent deep learning-based methods which rely on large-scale pre-trained weights cannot be trained from scratch, with very limited research addressing this issue. This paper proposes a novel cycle pixel difference convolution (CPDC), which effectively integrates image gradient information with modern convolution operations. Based on the CPDC, we develop a U-shape encoder-decoder model named CPD-Net, which is a purely end-to-end network. Additionally, to address the issue of edge thickness produced by most existing methods, we construct a multi-scale information enhancement module (MSEM) to enhance the discriminative ability of the model, thereby generating crisp and clean contour maps. Comprehensive experiments conducted on three standard benchmarks demonstrate that our method achieves competitive performance on the BSDS500 dataset (ODS=0.813), NYUD-V2 (ODS=0.760), and BIPED dataset (ODS=0.898). Our approach provides a novel perspective for addressing these challenges in edge detection.
边缘检测是计算机视觉中的一个基本任务,吸引了越来越多的关注。深度学习的出现显著地推动了这个领域的发展。然而,依赖大型预训练权重的大规模预训练方法无法从零开始训练,为了解决这个问题,非常有限的研究工作进行了探讨。本文提出了一种新颖的循环像素差异卷积(CPDC),它有效地将图像梯度信息与现代卷积操作相结合。基于CPDC,我们开发了一个U形编码器-解码器模型,名为CPD-Net,这是一个纯粹的端到端网络。此外,为了解决大多数现有方法产生的边缘厚度的問題,我们构建了一个多尺度信息增强模块(MSEM),从而增强模型的判别能力,从而产生清晰干净的轮廓图。在三个标准的基准上进行全面的实验证明,我们的方法在BSDS500数据集(ODS=0.813)、NYUD-V2(ODS=0.760)和BIPED数据集(ODS=0.898)上实现了与现有方法的竞争力的性能。我们的方法为解决这些边缘检测中的挑战提供了一个新颖的视角。
https://arxiv.org/abs/2409.04272
Edge detection in images is the foundation of many complex tasks in computer graphics. Due to the feature loss caused by multi-layer convolution and pooling architectures, learning-based edge detection models often produce thick edges and struggle to detect the edges of small objects in images. Inspired by state space models, this paper presents an edge detection algorithm which effectively addresses the aforementioned issues. The presented algorithm obtains state space variables of the image from dual-input channels with minimal down-sampling processes and utilizes these state variables for real-time learning and memorization of image text. Additionally, to achieve precise edges while filtering out false edges, a post-processing algorithm called wind erosion has been designed to handle the binary edge map. To further enhance the processing speed of the algorithm, we have designed parallel computing circuits for the most computationally intensive parts of presented algorithm, significantly improving computational speed and efficiency. Experimental results demonstrate that the proposed algorithm achieves precise thin edge localization and exhibits noise suppression capabilities across various types of images. With the parallel computing circuits, the algorithm to achieve processing speeds exceeds 30 FPS on 5K images.
在图像中检测边缘是计算机图形学中许多复杂任务的基础。由于多层卷积和池化架构造成的特征损失,基于学习的边缘检测模型通常会产生厚边缘,并且在图像中检测小对象的边缘时常常遇到困难。受到状态空间模型的启发,本文提出了一种解决上述问题的边缘检测算法。所述算法从双输入通道获得图像的状态空间变量,并采用最小 down-sampling 过程来获得这些状态变量。此外,为了在滤除假边缘的同时获得精确的边缘,设计了一个称为风蚀的后处理算法来处理二进制边缘图。为了进一步提高算法的处理速度,我们设计了一套并行计算电路来处理算法中计算量最大的部分,显著提高了计算速度和效率。实验结果表明,与所提出的算法相比,所提出的算法能够实现精确的细边缘局部定位,并表现出对各种类型的图像噪声的抑制能力。使用并行计算电路,所提出的算法的处理速度超过了5K图像上的30 FPS。
https://arxiv.org/abs/2409.01609
Three-dimensional (3D) reconstruction from two-dimensional images is an active research field in computer vision, with applications ranging from navigation and object tracking to segmentation and three-dimensional modeling. Traditionally, parametric techniques have been employed for this task. However, recent advancements have seen a shift towards learning-based methods. Given the rapid pace of research and the frequent introduction of new image matching methods, it is essential to evaluate them. In this paper, we present a comprehensive evaluation of various image matching methods using a structure-from-motion pipeline. We assess the performance of these methods on both in-domain and out-of-domain datasets, identifying key limitations in both the methods and benchmarks. We also investigate the impact of edge detection as a pre-processing step. Our analysis reveals that image matching for 3D reconstruction remains an open challenge, necessitating careful selection and tuning of models for specific scenarios, while also highlighting mismatches in how metrics currently represent method performance.
三维(3D)从二维图像重建是一个活跃的研究领域,在计算机视觉中具有广泛的应用,从导航和目标跟踪到分割和三维建模。传统上,参数技术用于此任务。然而,最近的研究趋势是向基于学习的技术转移。鉴于研究速率和图像匹配方法的频繁引入,评估它们非常重要。在本文中,我们使用结构从运动框架对各种图像匹配方法进行了全面的评估。我们评估这些方法在领域内和领域外数据上的性能,并确定了方法和基准中的关键局限性。我们还研究了边缘检测作为预处理步骤的影响。我们的分析揭示了三维重建图像匹配仍然是一个未解决的问题,需要对特定场景的模型进行仔细选择和调整,同时突出了当前指标如何表示方法性能的差异。
https://arxiv.org/abs/2408.16445
Lung cancer stands as the preeminent cause of cancer-related mortality globally. Prompt and precise diagnosis, coupled with effective treatment, is imperative to reduce the fatality rates associated with this formidable disease. This study introduces a cutting-edge deep learning framework for the classification of lung cancer from CT scan imagery. The research encompasses a suite of image pre-processing strategies, notably Canny edge detection, and wavelet transformations, which precede the extraction of salient features and subsequent classification via a Multi-Layer Perceptron (MLP). The optimization process is further refined using the Dragonfly Algorithm (DA). The methodology put forth has attained an impressive training and testing accuracy of 99.82\%, underscoring its efficacy and reliability in the accurate diagnosis of lung cancer.
肺癌是导致全球癌症相关死亡的主要原因。迅速和精确的诊断以及有效的治疗是降低与这种强大疾病相关的死亡率的关键。这项研究介绍了一种尖端的深度学习框架,用于从CT扫描图像中分类肺癌。研究包括一系列图像预处理策略,尤其是Canny边缘检测和谱聚类变换,这些策略在提取显著特征并通过多层感知器(MLP)进行分类之前。通过使用Dragonfly算法(DA)进行优化,该方法进一步提高了训练和测试的准确性,使其在准确诊断肺癌方面具有出色的效果和可靠性。
https://arxiv.org/abs/2408.15355
White blood cells (WBCs) are the most diverse cell types observed in the healing process of injured skeletal muscles. In the course of healing, WBCs exhibit dynamic cellular response and undergo multiple protein expression changes. The progress of healing can be analyzed by quantifying the number of WBCs or the amount of specific proteins in light microscopic images obtained at different time points after injury. In this paper, we propose an automated quantifying and analysis framework to analyze WBCs using light microscopic images of uninjured and injured muscles. The proposed framework is based on the Localized Iterative Otsu's threshold method with muscle edge detection and region of interest extraction. Compared with the threshold methods used in ImageJ, the LI Otsu's threshold method has high resistance to background area and achieves better accuracy. The CD68-positive cell results are presented for demonstrating the effectiveness of the proposed work.
白细胞(WBCs)是在损伤骨骼肌修复过程中观察到的最丰富的细胞类型。在修复过程中,WBCs表现出动态的细胞响应并经历多次蛋白质表达变化。通过定量损伤后在不同时间点获得的轻显微镜图像中的WBC数量或特定蛋白质的数量,可以分析修复进展。在本文中,我们提出了一个自动定量分析框架,用于通过轻显微镜图像分析未受伤和受伤的肌肉。所提出的框架基于局部迭代Otsu阈值方法与肌肉边缘检测和区域 of interest 提取。与ImageJ中使用的阈值方法相比,LI Otsu的阈值方法对背景区域具有很高的抵抗力,并实现更好的准确性。我们展示了CD68阳性细胞的结果,以证明所提出工作的有效性。
https://arxiv.org/abs/2409.06722
In this paper, a color edge detection strategy based on collaborative filtering combined with multiscale gradient fusion is proposed. The block-matching and 3D (BM3D) filter are used to enhance the sparse representation in the transform domain and achieve the effect of denoising, whereas the multiscale gradient fusion makes up for the defect of loss of details in single-scale edge detection and improves the edge detection resolution and quality. First, the RGB images in the dataset are converted to XYZ color space images through mathematical operations. Second, the colored block-matching and 3D (CBM3D) filter are used on the sparse images and to remove noise interference. Then, the vector gradients of the color image and the anisotropic Gaussian directional derivative of the two scale parameters are calculated and averaged pixel-by-pixel to obtain a new edge strength map. Finally, the edge features are enhanced by image normalization and non-maximum suppression technology, and on that basis, the edge contour is obtained by double threshold selection and a new morphological refinement method. Through an experimental analysis of the edge detection dataset, the method proposed has good noise robustness and high edge quality, which is better than the Color Sobel, Color Canny, SE and Color AGDD as shown by the PR curve, AUC, PSNR, MSE, and FOM indicators.
在本文中,提出了一种基于协同过滤的多尺度梯度融合的彩色边缘检测策略。块匹配和3D(BM3D)滤波器用于增强变换域中的稀疏表示,实现去噪效果,而多尺度梯度融合弥补了单尺度边缘检测中细节损失的缺陷,提高了边缘检测精度和质量。首先,将数据集中的RGB图像通过数学变换转换为XYZ颜色空间图像。然后,在稀疏图像上使用彩色块匹配和3D(CBM3D)滤波器,以消除噪声干扰。接着,计算颜色图像的向量梯度和两个标度参数的扩散加权梯度,并平均每个像素以获得新的边缘强度图。最后,通过图像归一化和非最大抑制技术增强边缘特征,然后通过双阈值选择和新的形态修复方法获得边缘轮廓。通过对边缘检测数据集的实验分析,与Color Sobel、Color Canny、SE和Color AGDD等方法相比,所提出的边缘检测方法具有很好的噪声鲁棒性和高边缘质量,据PR曲线、AUC、PSNR、MSE和FOM指标所示。
https://arxiv.org/abs/2408.14013
Transformers, renowned for their powerful feature extraction capabilities, have played an increasingly prominent role in various vision tasks. Especially, recent advancements present transformer with hierarchical structures such as Dilated Neighborhood Attention Transformer (DiNAT), demonstrating outstanding ability to efficiently capture both global and local features. However, transformers' application in edge detection has not been fully exploited. In this paper, we propose EdgeNAT, a one-stage transformer-based edge detector with DiNAT as the encoder, capable of extracting object boundaries and meaningful edges both accurately and efficiently. On the one hand, EdgeNAT captures global contextual information and detailed local cues with DiNAT, on the other hand, it enhances feature representation with a novel SCAF-MLA decoder by utilizing both inter-spatial and inter-channel relationships of feature maps. Extensive experiments on multiple datasets show that our method achieves state-of-the-art performance on both RGB and depth images. Notably, on the widely used BSDS500 dataset, our L model achieves impressive performances, with ODS F-measure and OIS F-measure of 86.0%, 87.6% for multi-scale input,and 84.9%, and 86.3% for single-scale input, surpassing the current state-of-the-art EDTER by 1.2%, 1.1%, 1.7%, and 1.6%, respectively. Moreover, as for throughput, our approach runs at 20.87 FPS on RTX 4090 GPU with single-scale input. The code for our method will be released soon.
Transformer 以其强大的特征提取能力而闻名,在各种视觉任务中扮演着越来越重要的角色。特别是,最近的进展使Transformer具有类似于Dilated Neighborhood Attention Transformer(DiNAT)的层次结构,展示了出色的全局和局部特征抓取能力。然而,在边缘检测方面,Transformer的应用并没有完全被充分利用。在本文中,我们提出了EdgeNAT,一种一阶段的Transformer边缘检测器,使用DiNAT作为编码器,能够准确且高效地提取物体边界和有意义边缘。一方面,EdgeNAT利用DiNAT的跨空间和跨通道信息捕捉全局上下文信息,另一方面,它通过利用特征图的跨空间和通道关系来增强特征表示。在多个数据集上的大量实验证明,我们的方法在RGB和深度图像上均取得了最先进的性能。值得注意的是,在广泛使用的BSDS500数据集上,我们的L模型在表现上已经超越了目前的EDTER,其ODS F-分数为86.0%,OIS F-分数为87.6%,多尺度输入下的ODS F-分数为84.9%,单尺度输入下的ODS F-分数为86.3%,分别比目前的EDTER提高了1.2%、1.1%、1.7%和1.6%。此外,在吞吐量方面,我们的方法在RTX 4090 GPU上以单尺度输入时的运行速度为20.87 FPS。我们的方法将很快发布代码。
https://arxiv.org/abs/2408.10527
Edge detection is crucial in medical image processing, enabling precise extraction of structural information to support lesion identification and image analysis. Traditional edge detection models typically rely on complex Convolutional Neural Networks and Vision Transformer architectures. Due to their numerous parameters and high computational demands, these models are limited in their application on resource-constrained devices. This paper presents an ultra-lightweight edge detection model (UHNet), characterized by its minimal parameter count, rapid computation speed, negligible of pre-training costs, and commendable performance. UHNet boasts impressive performance metrics with 42.3k parameters, 166 FPS, and 0.79G FLOPs. By employing an innovative feature extraction module and optimized residual connection method, UHNet significantly reduces model complexity and computational requirements. Additionally, a lightweight feature fusion strategy is explored, enhancing detection accuracy. Experimental results on the BSDS500, NYUD, and BIPED datasets validate that UHNet achieves remarkable edge detection performance while maintaining high efficiency. This work not only provides new insights into the design of lightweight edge detection models but also demonstrates the potential and application prospects of the UHNet model in engineering applications such as medical image processing. The codes are available at this https URL
边缘检测在医学图像处理中至关重要,它能够精确提取结构信息以支持病变识别和图像分析。传统的边缘检测模型通常依赖于复杂的卷积神经网络和视觉Transformer架构。由于其参数数量众多和高计算需求,这些模型在资源受限的设备上的应用有限。本文介绍了一种超轻量边缘检测模型(UHNet),其特点是最少参数计数、快速计算速度、预训练成本微小以及出色的性能。UHNet在42.3k个参数、166 FPS和0.79G FLOPs的惊人表现指标上具有出色的性能。通过采用创新特征提取模块和优化残差连接方法,UHNet显著减少了模型复杂度和计算需求。此外,还探索了轻量级特征融合策略,提高了检测精度。在BSDS500、NYUD和BIPED数据集上的实验结果证实,UHNet在实现卓越的边缘检测性能的同时,具有高效率。这项工作不仅为轻量边缘检测模型的设计提供了新的见解,还展示了UHNet模型在工程应用(如医学图像处理)中的潜力和应用前景。代码可在此链接https://www.researchgate.net/publication/333002112/中获取。
https://arxiv.org/abs/2408.04258
Detection of Graphical User Interface (GUI) elements is a crucial task for automatic code generation from images and sketches, GUI testing, and GUI search. Recent studies have leveraged both old-fashioned and modern computer vision (CV) techniques. Oldfashioned methods utilize classic image processing algorithms (e.g. edge detection and contour detection) and modern methods use mature deep learning solutions for general object detection tasks. GUI element detection, however, is a domain-specific case of object detection, in which objects overlap more often, and are located very close to each other, plus the number of object classes is considerably lower, yet there are more objects in the images compared to natural images. Hence, the studies that have been carried out on comparing various object detection models, might not apply to GUI element detection. In this study, we evaluate the performance of the four most recent successful YOLO models for general object detection tasks on GUI element detection and investigate their accuracy performance in detecting various GUI elements.
检测图形用户界面(GUI)元素是自动从图像和手绘中提取图形用户界面的关键任务,也是图像和手绘的GUI测试和GUI搜索的主要目标。 近年来,研究者们充分利用了传统计算机视觉(CV)技术和现代深度学习解决方案。 传统方法使用经典的图像处理算法(例如边缘检测和轮廓检测)来进行检测,而现代方法则使用成熟的开源深度学习模型来进行通用物体检测任务。 然而,GUI元素检测是一个领域特定的物体检测问题,物体重叠更频繁,彼此靠近,物体类别数量大大降低,然而图像中物体数量比自然图像中更多。因此,之前对比各种物体检测模型的研究可能不适用于GUI元素检测。 在这项研究中,我们评估了最新的四个成功YOLO模型在通用物体检测任务上的性能,并研究了它们在检测各种GUI元素方面的准确性。
https://arxiv.org/abs/2408.03507
Hedges allow speakers to mark utterances as provisional, whether to signal non-prototypicality or "fuzziness", to indicate a lack of commitment to an utterance, to attribute responsibility for a statement to someone else, to invite input from a partner, or to soften critical feedback in the service of face-management needs. Here we focus on hedges in an experimentally parameterized corpus of 63 Roadrunner cartoon narratives spontaneously produced from memory by 21 speakers for co-present addressees, transcribed to text (Galati and Brennan, 2010). We created a gold standard of hedges annotated by human coders (the Roadrunner-Hedge corpus) and compared three LLM-based approaches for hedge detection: fine-tuning BERT, and zero and few-shot prompting with GPT-4o and LLaMA-3. The best-performing approach was a fine-tuned BERT model, followed by few-shot GPT-4o. After an error analysis on the top performing approaches, we used an LLM-in-the-Loop approach to improve the gold standard coding, as well as to highlight cases in which hedges are ambiguous in linguistically interesting ways that will guide future research. This is the first step in our research program to train LLMs to interpret and generate collateral signals appropriately and meaningfully in conversation.
障碍允许说话者标记语句为临时性,无论是表示非典型性还是“模糊性”,表示对陈述的缺乏承诺,将责任归咎于某人,邀请伙伴提供反馈,还是为了面管理需求软化批判性反馈。在这里,我们关注实验参数化的63个路飞动画故事文本,由21个演讲者共同创作,转录成文本(Galati和Brennan,2010)。我们创建了一个由人类编码者注释的障碍 gold standard,并比较了三种基于LLM的障碍检测方法:微调BERT,零和少样本提示与GPT-4o和LLaMA-3。最佳表现的方法是一个微调的BERT模型,其次是少样本GPT-4o。在最佳方法上进行错误分析后,我们使用LLM-in-the-Loop方法来改进 gold standard 的编码,以及强调在语言有趣的情况下,障碍是模糊的,这将指导未来的研究。这是我们研究计划的第一步,即训练LLM以在对话中适当地有意义地解释和生成附着信号。
https://arxiv.org/abs/2408.03319
Image Edge detection (ED) is a base task in computer vision. While the performance of the ED algorithm has been improved greatly by introducing CNN-based models, current models still suffer from unsatisfactory precision rates especially when only a low error toleration distance is allowed. Therefore, model architecture for more precise predictions still needs an investigation. On the other hand, the unavoidable noise training data provided by humans would lead to unsatisfactory model predictions even when inputs are edge maps themselves, which also needs improvement. In this paper, more precise ED models are presented with cascaded skipping density blocks (CSDB). Our models obtain state-of-the-art(SOTA) predictions in several datasets, especially in average precision rate (AP), which is confirmed by extensive experiments. Moreover, our models do not include down-sample operations, demonstrating those widely believed operations are not necessary. Also, a novel modification on data augmentation for training is employed, which allows noiseless data to be employed in model training and thus improves the performance of models predicting on edge maps themselves.
边缘检测(Edge Detection,ED)是计算机视觉领域的一个基本任务。随着CNN模型的引入,ED算法在性能上有显著提高,但目前使用的模型仍然存在精度率不高的问题,尤其是对于允许误差容差最小的情况。因此,为了提供更精确的预测,还需要进行架构优化的研究。另一方面,由于人类提供的不可避免的噪声训练数据,即使输入为边缘地图本身,模型的预测结果也可能不理想。本文提出了一种新的ED模型:多层跳过密度块(CSDB)。在多个检测集上,我们的模型获得了SOTA成绩,尤其是平均精度率(AP),由大量实验确认。此外,我们不再包含降采样操作,这与普遍认为的操作是不必要的。此外,还采用一种数据增强的新方法用于训练,允许使用无噪声的数据来训练模型,并从而提高了预测边缘地图时性能。
https://arxiv.org/abs/2407.19992
A significant challenge in the electroencephalogram EEG lies in the fact that current data representations involve multiple electrode signals, resulting in data redundancy and dominant lead information. However extensive research conducted on EEG classification focuses on designing model architectures without tackling the underlying issues. Otherwise, there has been a notable gap in addressing data preprocessing for EEG, leading to considerable computational overhead in Deep Learning (DL) processes. In light of these issues, we propose a simple yet effective approach for EEG data pre-processing. Our method first transforms the EEG data into an encoded image by an Inverted Channel-wise Magnitude Homogenization (ICWMH) to mitigate inter-channel biases. Next, we apply the edge detection technique on the EEG-encoded image combined with skip connection to emphasize the most significant transitions in the data while preserving structural and invariant information. By doing so, we can improve the EEG learning process efficiently without using a huge DL network. Our experimental evaluations reveal that we can significantly improve (i.e., from 2% to 5%) over current baselines.
在脑电图(EEG)中,一个重要的挑战是当前数据表示涉及多个电极信号,导致数据冗余和主导信号信息。然而,对EEG分类进行的广泛研究主要集中在设计没有解决底层问题的模型架构。否则,在处理EEG数据时,数据预处理方面存在显著的空白,导致在Deep Learning(DL)过程中产生相当大的计算开销。鉴于这些问题,我们提出了一个简单而有效的EEG数据预处理方法。我们的方法首先通过Inverted Channel-wise Magnitude Homogenization(ICWMH)将EEG数据转换为编码图像,以减轻通道偏差。接下来,我们在编码图像上应用边缘检测技术,并结合跳跃连接来强调数据中最重要的转换,同时保留结构和不变信息。通过这种方式,我们可以有效地提高EEG学习过程,而无需使用庞大的DL网络。我们的实验评估表明,我们可以显著地改善(即从2%到5%)现有基线。
https://arxiv.org/abs/2407.20247
Segmentation is often the first step in many medical image analyses workflows. Deep learning approaches, while giving state-of-the-art accuracies, are data intensive and do not scale well to low data regimes. We introduce Deep Conditional Shape Models 2.0, which uses an edge detector, along with an implicit shape function conditioned on edge maps, to leverage cross-modality shape information. The shape function is trained exclusively on a source domain (contrasted CT) and applied to the target domain of interest (3D echocardiography). We demonstrate data efficiency in the target domain by varying the amounts of training data used in the edge detection stage. We observe that DCSM 2.0 outperforms the baseline at all data levels in terms of Hausdorff distances, and while using 50% or less of the training data in terms of average mesh distance, and at 10% or less of the data with the dice coefficient. The method scales well to low data regimes, with gains of up to 5% in dice coefficient, 2.58 mm in average surface distance and 21.02 mm in Hausdorff distance when using just 2% (22 volumes) of the training data.
segmentation通常是许多医学图像分析工作流程中的第一步。虽然深度学习方法在提供最先进的准确度方面具有优势,但它们数据密集,并且对低数据量的情况下的扩展性不好。我们介绍了一种名为Deep Conditional Shape Models 2.0的方法,该方法使用边缘检测器和一个基于边缘图的隐式形状函数,以利用跨模态形状信息。形状函数仅在源域(对比CT)上训练,并应用于感兴趣域(3D超声心动图)。我们通过在边缘检测阶段的训练数据使用量来衡量在目标域中的数据效率。我们观察到,DCSM 2.0在所有数据水平上都优于基线,在平均网格距离方面,它优于基线50%或更少,在Dice系数方面优于基线10%或更少。该方法对低数据量的情况下的扩展性良好,在使用仅2%(22个)训练数据时的Dice系数增量为5%。此外,我们还观察到,当使用2%(22个)训练数据时,该方法在平均表面距离和Hausdorff距离方面分别优于基线2.58毫米和21.02毫米。
https://arxiv.org/abs/2407.00186
Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks. This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples identified through a reference-free consistency method. Empirical evaluations demonstrate significant improvements in reducing hallucination in generation across multiple subjects. Furthermore, the selective training framework mitigates catastrophic forgetting in out-of-distribution benchmarks, addressing a critical limitation in training LLMs. Our findings suggest that such an approach can substantially reduce the dependency on large labeled datasets, paving the way for more scalable and cost-effective language model training.
https://arxiv.org/abs/2406.11275
With the widespread application of Light Detection and Ranging (LiDAR) technology in fields such as autonomous driving, robot navigation, and terrain mapping, the importance of edge detection in LiDAR images has become increasingly prominent. Traditional edge detection methods often face challenges in accuracy and computational complexity when processing LiDAR images. To address these issues, this study proposes an edge detection method for LiDAR images based on artificial intelligence technology. This paper first reviews the current state of research on LiDAR technology and image edge detection, introducing common edge detection algorithms and their applications in LiDAR image processing. Subsequently, a deep learning-based edge detection model is designed and implemented, optimizing the model training process through preprocessing and enhancement of the LiDAR image dataset. Experimental results indicate that the proposed method outperforms traditional methods in terms of detection accuracy and computational efficiency, showing significant practical application value. Finally, improvement strategies are proposed for the current method's shortcomings, and the improvements are validated through experiments.
随着Lidar(激光雷达)技术在自动驾驶、机器人导航和地形测量等领域的广泛应用,在Lidar图像中检测边缘的重要性变得越来越突出。传统的边缘检测方法在处理Lidar图像时常常面临准确性和计算复杂性方面的挑战。为解决这些问题,本研究提出了一种基于人工智能技术的Lidar图像边缘检测方法。本文首先回顾了Lidar技术和图像边缘检测的研究现状,介绍了常见的边缘检测算法及其在Lidar图像处理中的应用。然后,设计并实现了一个基于深度学习的边缘检测模型,通过预处理和增强Lidar图像数据集来优化模型训练过程。实验结果表明,与传统方法相比,所提出的方法在检测精度和计算效率方面均表现出优异表现,具有显著的实用价值。最后,针对当前方法的不足,提出了改进策略,并通过实验验证了这些改进的有效性。
https://arxiv.org/abs/2406.09773