Underwater pipelines are highly susceptible to corrosion, which not only shorten their service life but also pose significant safety risks. Compared with manual inspection, the intelligent real-time imaging system for underwater pipeline detection has become a more reliable and practical solution. Among various underwater imaging techniques, structured light 3D imaging can restore the sufficient spatial detail for precise defect characterization. Therefore, this paper develops a multi-mode underwater structured light 3D imaging system for pipeline detection (UW-SLD system) based on multi-source information fusion. First, a rapid distortion correction (FDC) method is employed for efficient underwater image rectification. To overcome the challenges of extrinsic calibration among underwater sensors, a factor graph-based parameter optimization method is proposed to estimate the transformation matrix between the structured light and acoustic sensors. Furthermore, a multi-mode 3D imaging strategy is introduced to adapt to the geometric variability of underwater pipelines. Given the presence of numerous disturbances in underwater environments, a multi-source information fusion strategy and an adaptive extended Kalman filter (AEKF) are designed to ensure stable pose estimation and high-accuracy measurements. In particular, an edge detection-based ICP (ED-ICP) algorithm is proposed. This algorithm integrates pipeline edge detection network with enhanced point cloud registration to achieve robust and high-fidelity reconstruction of defect structures even under variable motion conditions. Extensive experiments are conducted under different operation modes, velocities, and depths. The results demonstrate that the developed system achieves superior accuracy, adaptability and robustness, providing a solid foundation for autonomous underwater pipeline detection.
水下管道极易受到腐蚀的影响,这不仅缩短了它们的服务寿命,还带来了显著的安全风险。相比人工检查,用于水下管道检测的智能实时成像系统已成为更可靠和实用的解决方案。在各种水下成像技术中,结构光3D成像能够恢复足够的空间细节以实现精确的缺陷表征。因此,本文开发了一种基于多源信息融合的水下管道检测用多重模式结构光3D成像系统(UW-SLD系统)。首先,采用快速失真校正(FDC)方法进行高效的水下图像校正。为了克服水下传感器之间的外在标定难题,提出了一种基于因子图的参数优化方法来估计结构光与声学传感器间的转换矩阵。此外,引入了多模式3D成像策略以适应水下管道的几何变化性。鉴于水下环境中存在大量干扰因素,设计了一种多源信息融合策略和自适应扩展卡尔曼滤波器(AEKF)确保稳定的姿态估计和高精度测量。特别地,提出了一种基于边缘检测的ICP(ED-ICP)算法。该算法结合了管道边缘检测网络与增强点云注册技术,在运动条件变化的情况下仍能实现缺陷结构的稳健且高度保真的重建。在不同操作模式、速度和深度下进行了广泛的实验验证,结果表明所开发系统实现了优越的精度、适应性和鲁棒性,为自主水下管道检测提供了坚实的基础。
https://arxiv.org/abs/2512.11354
We challenge the common belief that deep learning always trumps older techniques, using the example of grading Saint-Gaudens Double Eagle gold coins automatically. In our work, we put a feature-based Artificial Neural Network built around 192 custom features pulled from Sobel edge detection and HSV color analysis up against a hybrid Convolutional Neural Network that blends in EfficientNetV2, plus a straightforward Support Vector Machine as the control. Testing 1,785 coins graded by experts, the ANN nailed 86% exact matches and hit 98% when allowing a 3-grade leeway. On the flip side, CNN and SVM mostly just guessed the most common grade, scraping by with 31% and 30% exact hits. Sure, the CNN looked good on broader tolerance metrics, but that is because of some averaging trick in regression that hides how it totally flops at picking out specific grades. All told, when you are stuck with under 2,000 examples and lopsided classes, baking in real coin-expert knowledge through feature design beats out those inscrutable, all-in-one deep learning setups. This rings true for other niche quality checks where data's thin and know-how matters more than raw compute.
我们挑战了深度学习总是优于传统技术的普遍观点,通过自动评估圣高登斯双鹰金元硬币等级的例子来说明这一点。在我们的研究中,我们将一种基于192个自定义特征构建的功能型人工神经网络(ANN)与融合EfficientNetV2的混合卷积神经网络(CNN),以及作为对照组的支持向量机(SVM)进行了对比测试。这192个自定义特征是从索贝尔边缘检测和HSV颜色分析中提取出来的。我们用专家评估过的1,785枚硬币进行测试,结果发现ANN在完全匹配的情况下达到了86%的准确率,在允许±3等级偏差的情况下则达到98%的准确性。另一方面,CNN和SVM大多数时候只是猜测最常见的等级,精确匹配的准确率分别为31%和30%。虽然CNN在更宽松的标准下表现良好,但这主要是因为它在回归任务中使用了一些平均技巧来掩盖其在识别特定等级时完全失败的事实。 总的来说,在样本数量不足2,000且分类不均衡的情况下,通过设计特有硬币专家知识的功能型人工神经网络优于那些难以理解的一体化深度学习模型。这一结论同样适用于其他数据稀疏、专业知识比原始计算能力更重要的细分质量检查领域。
https://arxiv.org/abs/2512.04464
Medical image registration is crucial for various clinical and research applications including disease diagnosis or treatment planning which require alignment of images from different modalities, time points, or subjects. Traditional registration techniques often struggle with challenges such as contrast differences, spatial distortions, and modality-specific variations. To address these limitations, we propose a method that integrates learnable edge kernels with learning-based rigid and non-rigid registration techniques. Unlike conventional layers that learn all features without specific bias, our approach begins with a predefined edge detection kernel, which is then perturbed with random noise. These kernels are learned during training to extract optimal edge features tailored to the task. This adaptive edge detection enhances the registration process by capturing diverse structural features critical in medical imaging. To provide clearer insight into the contribution of each component in our design, we introduce four variant models for rigid registration and four variant models for non-rigid registration. We evaluated our approach using a dataset provided by the Medical University across three setups: rigid registration without skull removal, with skull removal, and non-rigid registration. Additionally, we assessed performance on two publicly available datasets. Across all experiments, our method consistently outperformed state-of-the-art techniques, demonstrating its potential to improve multi-modal image alignment and anatomical structure analysis.
医学图像配准对于包括疾病诊断或治疗计划在内的各种临床和研究应用至关重要,这些应用需要不同模态、时间点或受试者之间图像的对齐。传统配准技术在面对对比度差异、空间扭曲以及特定于模态的变化时常常遇到挑战。为了解决这些问题,我们提出了一种方法,该方法将可学习边缘核与基于学习的刚性和非刚性配准技术相结合。与传统的学习所有特征而不带有特定偏置的层不同,我们的方法从预定义的边缘检测核开始,并用随机噪声进行扰动。这些核在训练过程中被调整以提取针对任务的最佳边缘特征。这种自适应边缘检测通过捕获医学图像中关键的各种结构特征来增强配准过程。 为了更清楚地展示设计中的每个组件的作用,我们为刚性配准和非刚性配准分别引入了四种变体模型。我们使用某医科大学提供的数据集在三种配置下评估了我们的方法:无颅骨去除的刚性配准、有颅骨去除的刚性配准以及非刚性配准。此外,我们在两个公开可用的数据集上进行了性能评估。在所有实验中,我们的方法始终优于最先进的技术,证明其有能力改善多模态图像对齐和解剖结构分析。
https://arxiv.org/abs/2512.01771
Visual odometry techniques typically rely on feature extraction from a sequence of images and subsequent computation of optical flow. This point-to-point correspondence between two consecutive frames can be costly to compute and suffers from varying accuracy, which affects the odometry estimate's quality. Attempts have been made to bypass the difficulties originating from the correspondence problem by adopting line features and fusing other sensors (event camera, IMU) to improve performance, many of which still heavily rely on correspondence. If the camera observes a straight line as it moves, the image of the line sweeps a smooth surface in image-space time. It is a ruled surface and analyzing its shape gives information about odometry. Further, its estimation requires only differentially computed updates from point-to-line associations. Inspired by event cameras' propensity for edge detection, this research presents a novel algorithm to reconstruct 3D scenes and visual odometry from these ruled surfaces. By constraining the surfaces with the inertia measurements from an onboard IMU sensor, the dimensionality of the solution space is greatly reduced.
视觉里程计技术通常依赖于从一系列图像中提取特征,随后计算光流。这种连续两帧之间的一点对一点的对应关系难以计算且准确性不一,这会影响里程估计的质量。为克服由于对应问题带来的困难,人们采用线性特征并融合其他传感器(如事件相机、IMU)来提升性能的方法,其中很多方法仍然严重依赖于对应的准确性。 当摄像机在移动过程中观察到一条直线时,该直线的图像会在图像时间空间中扫过一个平滑的表面。这是一个可展曲面,对其形状的分析可以提供有关里程计的信息。此外,这种估计只需要从点到线关联中的差分计算更新。 受事件相机对边缘检测优势的启发,这项研究提出了一种新的算法,用于从这些可展表面上重建3D场景和视觉里程信息。通过使用机载IMU传感器提供的惯性测量数据来约束该表面,可以极大地减少解空间的维度。
https://arxiv.org/abs/2512.00327
The MARWIN robot operates at the European XFEL to perform autonomous radiation monitoring in long, monotonous accelerator tunnels where conventional localization approaches struggle. Its current navigation concept combines lidar-based edge detection, wheel/lidar odometry with periodic QR-code referencing, and fuzzy control of wall distance, rotation, and longitudinal position. While robust in predefined sections, this design lacks flexibility for unknown geometries and obstacles. This paper explores deep visual stereo odometry (DVSO) with 3D-geometric constraints as a focused alternative. DVSO is purely vision-based, leveraging stereo disparity, optical flow, and self-supervised learning to jointly estimate depth and ego-motion without labeled data. For global consistency, DVSO can subsequently be fused with absolute references (e.g., landmarks) or other sensors. We provide a conceptual evaluation for accelerator tunnel environments, using the European XFEL as a case study. Expected benefits include reduced scale drift via stereo, low-cost sensing, and scalable data collection, while challenges remain in low-texture surfaces, lighting variability, computational load, and robustness under radiation. The paper defines a research agenda toward enabling MARWIN to navigate more autonomously in constrained, safety-critical infrastructures.
MARWIN机器人在欧洲XFEL(X射线自由电子激光器)中运行,用于执行长且单调的加速器隧道中的自主辐射监测。这些地方常规定位方法难以应对。目前,MARWIN的导航概念结合了基于激光雷达的边缘检测、车轮/激光雷达里程计以及周期性二维码参考,并采用模糊控制来调节与墙壁的距离、旋转和纵向位置。尽管这种方法在预定义区域中表现稳健,但在未知几何形状和障碍物环境中缺乏灵活性。 本文探讨了一种专注于使用深度视觉立体测距(DVSO)及其3D几何约束的替代方案。DVSO是一种纯视觉方法,利用立体视差、光学流以及自我监督学习来估计深度和自身运动,无需标记数据。为了实现全局一致性,可以将DVSO与绝对参考点(如地标)或其他传感器融合。 我们使用欧洲XFEL作为案例研究,对加速器隧道环境中的这种概念进行了评估。预期的好处包括通过立体视差减少尺度漂移、低成本感应以及可扩展的数据采集能力。然而,在低纹理表面、光照变化、计算负载和辐射影响下的稳健性等问题仍待解决。 本文界定了一个研究议程,以使MARWIN能够在其受控的安全关键基础设施中实现更高级别的自主导航能力。
https://arxiv.org/abs/2512.00080
Fast and accurate video object recognition, which relies on frame-by-frame video analytics, remains a challenge for resource-constrained devices such as traffic cameras. Recent advances in mobile edge computing have made it possible to offload computation-intensive object detection to edge servers equipped with high-accuracy neural networks, while lightweight and fast object tracking algorithms run locally on devices. This hybrid approach offers a promising solution but introduces a new challenge: deciding when to perform edge detection versus local tracking. To address this, we formulate two long-term optimization problems for both single-device and multi-device scenarios, taking into account the temporal correlation of consecutive frames and the dynamic conditions of mobile edge networks. Based on the formulation, we propose the LTED-Ada in single-device setting, a deep reinforcement learning-based algorithm that adaptively selects between local tracking and edge detection, according to the frame rate as well as recognition accuracy and delay requirement. In multi-device setting, we further enhance LTED-Ada using federated learning to enable collaborative policy training across devices, thereby improving its generalization to unseen frame rates and performance requirements. Finally, we conduct extensive hardware-in-the-loop experiments using multiple Raspberry Pi 4B devices and a personal computer as the edge server, demonstrating the superiority of LTED-Ada.
https://arxiv.org/abs/2511.20716
Monocular simultaneous localization and mapping (SLAM) algorithms estimate drone poses and build a 3D map using a single camera. Current algorithms include sparse methods that lack detailed geometry, while learning-driven approaches produce dense maps but are computationally intensive. Monocular SLAM also faces scale ambiguities, which affect its accuracy. To address these challenges, we propose an edge-aware lightweight monocular SLAM system combining sparse keypoint-based pose estimation with dense edge reconstruction. Our method employs deep learning-based depth prediction and edge detection, followed by optimization to refine keypoints and edges for geometric consistency, without relying on global loop closure or heavy neural computations. We fuse inertial data with vision by using an extended Kalman filter to resolve scale ambiguity and improve accuracy. The system operates in real time on low-power platforms, as demonstrated on a DJI Tello drone with a monocular camera and inertial sensors. In addition, we demonstrate robust autonomous navigation and obstacle avoidance in indoor corridors and on the TUM RGBD dataset. Our approach offers an effective, practical solution to real-time mapping and navigation in resource-constrained environments.
https://arxiv.org/abs/2511.14335
Chess has experienced a large increase in viewership since the pandemic, driven largely by the accessibility of online learning platforms. However, no equivalent assistance exists for physical chess games, creating a divide between analog and digital chess experiences. This paper presents CVChess, a deep learning framework for converting chessboard images to Forsyth-Edwards Notation (FEN), which is later input into online chess engines to provide you with the best next move. Our approach employs a convolutional neural network (CNN) with residual layers to perform piece recognition from smartphone camera images. The system processes RGB images of a physical chess board through a multistep process: image preprocessing using the Hough Line Transform for edge detection, projective transform to achieve a top-down board alignment, segmentation into 64 individual squares, and piece classification into 13 classes (6 unique white pieces, 6 unique black pieces and an empty square) using the residual CNN. Residual connections help retain low-level visual features while enabling deeper feature extraction, improving accuracy and stability during training. We train and evaluate our model using the Chess Recognition Dataset (ChessReD), containing 10,800 annotated smartphone images captured under diverse lighting conditions and angles. The resulting classifications are encoded as an FEN string, which can be fed into a chess engine to generate the most optimal move
https://arxiv.org/abs/2511.11522
Understanding and mitigating flicker effects caused by rapid variations in light intensity is critical for enhancing the performance of event cameras in diverse environments. This paper introduces an innovative autonomous mechanism for tuning the biases of event cameras, effectively addressing flicker across a wide frequency range -25 Hz to 500 Hz. Unlike traditional methods that rely on additional hardware or software for flicker filtering, our approach leverages the event cameras inherent bias settings. Utilizing a simple Convolutional Neural Networks -CNNs, the system identifies instances of flicker in a spatial space and dynamically adjusts specific biases to minimize its impact. The efficacy of this autobiasing system was robustly tested using a face detector framework under both well-lit and low-light conditions, as well as across various frequencies. The results demonstrated significant improvements: enhanced YOLO confidence metrics for face detection, and an increased percentage of frames capturing detected faces. Moreover, the average gradient, which serves as an indicator of flicker presence through edge detection, decreased by 38.2 percent in well-lit conditions and by 53.6 percent in low-light conditions. These findings underscore the potential of our approach to significantly improve the functionality of event cameras in a range of adverse lighting scenarios.
https://arxiv.org/abs/2511.02180
Reconstruction of an object from points cloud is essential in prosthetics, medical imaging, computer vision, etc. We present an effective algorithm for an Allen--Cahn-type model of reconstruction, employing the Lagrange multiplier approach. Utilizing scattered data points from an object, we reconstruct a narrow shell by solving the governing equation enhanced with an edge detection function derived from the unsigned distance function. The specifically designed edge detection function ensures the energy stability. By reformulating the governing equation through the Lagrange multiplier technique and implementing a Crank--Nicolson time discretization, we can update the solutions in a stable and decoupled manner. The spatial operations are approximated using the finite difference method, and we analytically demonstrate the unconditional stability of the fully discrete scheme. Comprehensive numerical experiments, including reconstructions of complex 3D volumes such as characters from \textit{Star Wars}, validate the algorithm's accuracy, stability, and effectiveness. Additionally, we analyze how specific parameter selections influence the level of detail and refinement in the reconstructed volumes. To facilitate the interested readers to understand our algorithm, we share the computational codes and data in this https URL.
https://arxiv.org/abs/2511.00508
Autonomous Vehicles (AVs) are transforming the future of transportation through advances in intelligent perception, decision-making, and control systems. However, their success is tied to one core capability, reliable object detection in complex and multimodal environments. While recent breakthroughs in Computer Vision (CV) and Artificial Intelligence (AI) have driven remarkable progress, the field still faces a critical challenge as knowledge remains fragmented across multimodal perception, contextual reasoning, and cooperative intelligence. This survey bridges that gap by delivering a forward-looking analysis of object detection in AVs, emphasizing emerging paradigms such as Vision-Language Models (VLMs), Large Language Models (LLMs), and Generative AI rather than re-examining outdated techniques. We begin by systematically reviewing the fundamental spectrum of AV sensors (camera, ultrasonic, LiDAR, and Radar) and their fusion strategies, highlighting not only their capabilities and limitations in dynamic driving environments but also their potential to integrate with recent advances in LLM/VLM-driven perception frameworks. Next, we introduce a structured categorization of AV datasets that moves beyond simple collections, positioning ego-vehicle, infrastructure-based, and cooperative datasets (e.g., V2V, V2I, V2X, I2I), followed by a cross-analysis of data structures and characteristics. Ultimately, we analyze cutting-edge detection methodologies, ranging from 2D and 3D pipelines to hybrid sensor fusion, with particular attention to emerging transformer-driven approaches powered by Vision Transformers (ViTs), Large and Small Language Models (SLMs), and VLMs. By synthesizing these perspectives, our survey delivers a clear roadmap of current capabilities, open challenges, and future opportunities.
https://arxiv.org/abs/2510.26641
The edge detection task is essential in image processing aiming to extract relevant information from an image. One recurring problem in this task is the weaknesses found in some detectors, such as the difficulty in detecting loose edges and the lack of context to extract relevant information from specific problems. To address these weaknesses and adapt the detector to the properties of an image, an adaptable detector described by two-dimensional cellular automaton and optimized by meta-heuristic combined with transfer learning techniques was developed. This study aims to analyze the impact of expanding the search space of the optimization phase and the robustness of the adaptability of the detector in identifying edges of a set of natural images and specialized subsets extracted from the same image set. The results obtained prove that expanding the search space of the optimization phase was not effective for the chosen image set. The study also analyzed the adaptability of the model through a series of experiments and validation techniques and found that, regardless of the validation, the model was able to adapt to the input and the transfer learning techniques applied to the model showed no significant improvements.
https://arxiv.org/abs/2510.26509
Intelligent vehicles are one of the most important outcomes gained from the world tendency toward automation. Applications of IVs, whether in urban roads or robot tracks, do prioritize lane path detection. This paper proposes an FPGA-based Lane Detector Vehicle LDV architecture that relies on the Sobel algorithm for edge detection. Operating on 416 x 416 images and 150 MHz, the system can generate a valid output every 1.17 ms. The valid output consists of the number of present lanes, the current lane index, as well as its right and left boundaries. Additionally, the automated light and temperature control units in the proposed system enhance its adaptability to the surrounding environmental conditions.
https://arxiv.org/abs/2510.24778
This study introduces a modular framework for spatial image processing, integrating grayscale quantization, color and brightness enhancement, image sharpening, bidirectional transformation pipelines, and geometric feature extraction. A stepwise intensity transformation quantizes grayscale images into eight discrete levels, producing a posterization effect that simplifies representation while preserving structural detail. Color enhancement is achieved via histogram equalization in both RGB and YCrCb color spaces, with the latter improving contrast while maintaining chrominance fidelity. Brightness adjustment is implemented through HSV value-channel manipulation, and image sharpening is performed using a 3 * 3 convolution kernel to enhance high-frequency details. A bidirectional transformation pipeline that integrates unsharp masking, gamma correction, and noise amplification achieved accuracy levels of 76.10% and 74.80% for the forward and reverse processes, respectively. Geometric feature extraction employed Canny edge detection, Hough-based line estimation (e.g., 51.50° for billiard cue alignment), Harris corner detection, and morphological window localization. Cue isolation further yielded 81.87\% similarity against ground truth images. Experimental evaluation across diverse datasets demonstrates robust and deterministic performance, highlighting its potential for real-time image analysis and computer vision.
这项研究介绍了一个用于空间图像处理的模块化框架,集成了灰度量化、颜色和亮度增强、图像锐化、双向转换管道以及几何特征提取。该框架通过逐步强度变换将灰度图像量化为八个离散级别,产生一种海报化效果,在简化表示的同时保持结构细节。颜色增强是通过对RGB和YCrCb色彩空间的直方图均衡实现的,后者提高了对比度并保留了色度保真度。亮度调整是通过HSV值通道操作来实施的,并且图像锐化采用3x3卷积核进行高频细节的增强。 双向转换管道整合了不清晰掩模、伽马校正和噪声放大技术,在前向和反向过程中分别达到了76.10%和74.80%的精度水平。几何特征提取使用Canny边缘检测、基于霍夫变换的直线估计(例如,台球杆对齐角度为51.50°)、Harris角点检测以及形态学窗口定位。 台球杆隔离进一步与地面真实图像相比达到了81.87%的相似度。跨多种数据集进行实验评估证明了其稳健和确定性的性能表现,突显出其在实时图像分析和计算机视觉领域的潜在应用价值。
https://arxiv.org/abs/2510.08449
Medical Image Segmentation (MIS) stands as a cornerstone in medical image analysis, playing a pivotal role in precise diagnostics, treatment planning, and monitoring of various medical conditions. This paper presents a comprehensive and systematic survey of MIS methodologies, bridging the gap between traditional image processing techniques and modern deep learning approaches. The survey encompasses thresholding, edge detection, region-based segmentation, clustering algorithms, and model-based techniques while also delving into state-of-the-art deep learning architectures such as Convolutional Neural Networks (CNNs), Fully Convolutional Networks (FCNs), and the widely adopted U-Net and its variants. Moreover, integrating attention mechanisms, semi-supervised learning, generative adversarial networks (GANs), and Transformer-based models is thoroughly explored. In addition to covering established methods, this survey highlights emerging trends, including hybrid architectures, cross-modality learning, federated and distributed learning frameworks, and active learning strategies, which aim to address challenges such as limited labeled datasets, computational complexity, and model generalizability across diverse imaging modalities. Furthermore, a specialized case study on lumbar spine segmentation is presented, offering insights into the challenges and advancements in this relatively underexplored anatomical region. Despite significant progress in the field, critical challenges persist, including dataset bias, domain adaptation, interpretability of deep learning models, and integration into real-world clinical workflows.
医学图像分割(MIS)在医学影像分析中占据核心地位,对于精准诊断、治疗规划以及各种医疗状况的监测至关重要。本文对MIS方法进行了全面而系统的综述,旨在弥合传统图像处理技术和现代深度学习方法之间的差距。该文献涵盖了阈值法、边缘检测、基于区域分割、聚类算法和模型驱动技术等传统方法,并深入探讨了当前最先进的深度学习架构,包括卷积神经网络(CNN)、全卷积网络(FCN)以及广受欢迎的U-Net及其变体。此外,还详细探究了注意力机制、半监督学习、生成对抗网络(GANs)及基于Transformer的模型等领域的集成应用。 除了涵盖已确立的方法之外,本综述还强调了一些新兴趋势,例如混合架构、跨模态学习、联邦和分布式学习框架以及主动学习策略。这些方法旨在解决有限标记数据集、计算复杂性以及在不同成像模式中实现模型泛化等问题带来的挑战。此外,本文还提供了一个关于腰椎分割的专门案例研究,提供了对这一相对未被充分探索解剖区域中的难题与进展的独特见解。 尽管该领域取得了显著进步,但仍存在一些关键性的挑战,包括数据集偏差、领域适应性、深度学习模型的可解释性和将其融入实际临床工作流程的问题。
https://arxiv.org/abs/2510.03318
Drone-based rapid and accurate environmental edge detection is highly advantageous for tasks such as disaster relief and autonomous navigation. Current methods, using radars or cameras, raise deployment costs and burden lightweight drones with high computational demands. In this paper, we propose AirTouch, a system that transforms the ground effect from a stability "foe" in traditional flight control views, into a "friend" for accurate and efficient edge detection. Our key insight is that analyzing drone basic attitude sensor readings and flight commands allows us to detect ground effect changes. Such changes typically indicate the drone flying over a boundary of two materials, making this information valuable for edge detection. We approach this insight through theoretical analysis, algorithm design, and implementation, fully leveraging the ground effect as a new sensing modality without compromising drone flight stability, thereby achieving accurate and efficient scene edge detection. We also compare this new sensing modality with vision-based methods to clarify its exclusive advantages in resource efficiency and detection capability. Extensive evaluations demonstrate that our system achieves a high detection accuracy with mean detection distance errors of 0.051m, outperforming the baseline method performance by 86%. With such detection performance, our system requires only 43 mW power consumption, contributing to this new sensing modality for low-cost and highly efficient edge detection.
基于无人机的快速且准确的环境边缘检测对于诸如灾害救援和自主导航等任务非常有利。目前使用雷达或相机的方法会增加部署成本,并给轻型无人机带来高计算需求负担。在本文中,我们提出了AirTouch系统,该系统将地面效应从传统飞行控制视角中的稳定性“敌人”转变为精确高效边缘检测的“朋友”。我们的关键洞察是分析无人机基本姿态传感器读数和飞行指令可以让我们检测到地面效应的变化。这种变化通常表明无人机飞越了两种材料之间的边界,因此这些信息对于边缘检测非常有价值。我们通过理论分析、算法设计和实施来实现这一见解,并充分利用地面效应作为新的感测模式,同时不损害无人机的飞行稳定性,从而实现了精确高效的场景边缘检测。 此外,我们将这种方法与基于视觉的方法进行比较,以阐明其在资源效率和检测能力方面的独特优势。广泛的评估表明,我们的系统能够实现高精度的检测,平均检测距离误差仅为0.051米,优于基准方法86%。凭借这种检测性能,我们的系统仅需消耗43毫瓦功率,为低成本且高效的边缘检测提供了新的感测模式。
https://arxiv.org/abs/2509.21085
We present an automated vision-based system for defect detection and classification of laser power meter sensor coatings. Our approach addresses the critical challenge of identifying coating defects such as thermal damage and scratches that can compromise laser energy measurement accuracy in medical and industrial applications. The system employs an unsupervised anomaly detection framework that trains exclusively on ``good'' sensor images to learn normal coating distribution patterns, enabling detection of both known and novel defect types without requiring extensive labeled defect datasets. Our methodology consists of three key components: (1) a robust preprocessing pipeline using Laplacian edge detection and K-means clustering to segment the area of interest, (2) synthetic data augmentation via StyleGAN2, and (3) a UFlow-based neural network architecture for multi-scale feature extraction and anomaly map generation. Experimental evaluation on 366 real sensor images demonstrates $93.8\%$ accuracy on defective samples and $89.3\%$ accuracy on good samples, with image-level AUROC of 0.957 and pixel-level AUROC of 0.961. The system provides potential annual cost savings through automated quality control and processing times of 0.5 seconds per image in on-device implementation.
我们提出了一种基于视觉的自动化系统,用于检测和分类激光功率计传感器涂层中的缺陷。该方法旨在解决在医疗和工业应用中识别热损伤、划痕等可能导致激光能量测量精度降低的关键挑战。我们的系统采用了一个无监督异常检测框架,仅使用“良好”的传感器图像进行训练,以学习正常涂层分布模式,并能够不依赖大量标记的缺陷数据集来检测已知和新型缺陷类型。 该方法由三个关键组成部分构成:(1)一个稳健的预处理管道,利用拉普拉斯边缘检测和K-means聚类对感兴趣的区域进行分割;(2)通过StyleGAN2实现的合成数据增强;以及(3)基于UFlow的神经网络架构用于多尺度特征提取及异常图生成。 在包含366张真实传感器图像的数据集上,实验评估表明,在有缺陷样本上的准确率为93.8%,良好样本上的准确率为89.3%,整体AUROC值为0.957(图像级别)和0.961(像素级别)。该系统通过自动化的质量控制流程有望节省年度成本,并且在设备内实现时每张图片的处理时间仅为0.5秒。
https://arxiv.org/abs/2509.20946
Traditional object detection methods face performance degradation challenges in complex scenarios such as low-light conditions and heavy occlusions due to a lack of high-level semantic understanding. To address this, this paper proposes an adaptive guidance-based semantic enhancement edge-cloud collaborative object detection method leveraging Multimodal Large Language Models (MLLM), achieving an effective balance between accuracy and efficiency. Specifically, the method first employs instruction fine-tuning to enable the MLLM to generate structured scene descriptions. It then designs an adaptive mapping mechanism that dynamically converts semantic information into parameter adjustment signals for edge detectors, achieving real-time semantic enhancement. Within an edge-cloud collaborative inference framework, the system automatically selects between invoking cloud-based semantic guidance or directly outputting edge detection results based on confidence scores. Experiments demonstrate that the proposed method effectively enhances detection accuracy and efficiency in complex scenes. Specifically, it can reduce latency by over 79% and computational cost by 70% in low-light and highly occluded scenes while maintaining accuracy.
传统的目标检测方法在低光条件和严重遮挡等复杂场景中面临性能下降的挑战,因为这些方法缺乏高层次的语义理解。为了解决这个问题,本文提出了一种基于自适应指导的语义增强边缘-云端协同目标检测方法,该方法利用多模态大型语言模型(MLLM),实现了准确性和效率之间的有效平衡。 具体而言,该方法首先采用指令微调来使 MLLM 生成结构化的场景描述。然后设计了一个动态映射机制,可以将语义信息转换为边缘检测器的参数调整信号,从而实现实时语义增强。在一个边缘-云端协同推理框架内,系统根据置信度分数自动选择是在云端调用语义指导还是直接输出边缘检测结果。 实验表明,所提出的方法在复杂场景中有效地提高了检测准确性和效率。具体来说,在低光和高度遮挡的场景下,它可以将延迟降低79%以上,并减少计算成本达70%,同时保持高准确性。
https://arxiv.org/abs/2509.19875
Though deep learning techniques have made great progress in salient object detection recently, the predicted saliency maps still suffer from incomplete predictions due to the internal complexity of objects and inaccurate boundaries caused by strides in convolution and pooling operations. To alleviate these issues, we propose to train saliency detection networks by exploiting the supervision from not only salient object detection, but also foreground contour detection and edge detection. First, we leverage salient object detection and foreground contour detection tasks in an intertwined manner to generate saliency maps with uniform highlight. Second, the foreground contour and edge detection tasks guide each other simultaneously, thereby leading to precise foreground contour prediction and reducing the local noises for edge prediction. In addition, we develop a novel mutual learning module (MLM) which serves as the building block of our method. Each MLM consists of multiple network branches trained in a mutual learning manner, which improves the performance by a large margin. Extensive experiments on seven challenging datasets demonstrate that the proposed method has delivered state-of-the-art results in both salient object detection and edge detection.
尽管深度学习技术在显著物体检测方面最近取得了巨大进展,预测的显著性图仍然由于物体内部复杂性和卷积及池化操作中的步长导致边界不准确而存在不完整预测的问题。为了缓解这些问题,我们提出通过利用不仅来自显著物体检测、而且来自前景轮廓检测和边缘检测的任务监督来训练显著性检测网络。 首先,我们将显著物体检测和前景轮廓检测任务以交织的方式结合使用,生成具有均匀高亮的显著性图。其次,前景轮廓检测和边缘检测任务相互指导,从而实现精确的前景轮廓预测,并减少边缘预测中的局部噪声。此外,我们开发了一种新颖的互学习模块(MLM),这是方法的基础构建块。每个MLM包含多个以互学习方式训练的网络分支,显著提高了性能。 在七个具有挑战性的数据集上进行的广泛实验表明,所提出的方法在显著物体检测和边缘检测方面均达到了最先进的结果。
https://arxiv.org/abs/2509.21363
Deep Neural Networks (DNNs) require highly efficient matrix multiplication engines for complex computations. This paper presents a systolic array architecture incorporating novel exact and approximate processing elements (PEs), designed using energy-efficient positive partial product and negative partial product cells, termed as PPC and NPPC, respectively. The proposed 8-bit exact and approximate PE designs are employed in a 8x8 systolic array, which achieves a energy savings of 22% and 32%, respectively, compared to the existing design. To demonstrate their effectiveness, the proposed PEs are integrated into a systolic array (SA) for Discrete Cosine Transform (DCT) computation, achieving high output quality with a PSNR of 38.21,dB. Furthermore, in an edge detection application using convolution, the approximate PE achieves a PSNR of 30.45,dB. These results highlight the potential of the proposed design to deliver significant energy efficiency while maintaining competitive output quality, making it well-suited for error-resilient image and vision processing applications.
深度神经网络(DNN)需要高效的矩阵乘法引擎来进行复杂的计算。本文提出了一种采用新型精确和近似处理单元(PEs)的 systolic 数组架构,这些单元使用节能的正部分积和负部分积细胞设计,分别称为 PPC 和 NPPC。提出的 8 位精确和近似 PE 设计被应用于一个 8x8 的 systolic 数组,在与现有设计相比时,前者在能量节省方面达到了22%和32%。 为了证明其有效性,所提出的 PEs 被集成到用于离散余弦变换(DCT)计算的 systolic 数组中,实现了高达 38.21 dB 的 PSNR 输出质量。此外,在使用卷积进行边缘检测的应用中,近似 PE 实现了 30.45 dB 的 PSNR。 这些结果突显了所提出的架构在保持竞争力输出质量的同时能够实现显著的能源效率,并且该设计非常适合于具有错误容忍能力的图像和视觉处理应用。
https://arxiv.org/abs/2509.00778