Manual pruning of radiata pine trees poses significant safety risks due to extreme working heights and challenging terrain. This paper presents a computer vision framework that integrates YOLO object detection with Semi-Global Block Matching (SGBM) stereo vision for autonomous drone-based pruning operations. Our system achieves precise branch detection and depth estimation using only stereo camera input, eliminating the need for expensive LiDAR sensors. Experimental evaluation demonstrates YOLO's superior performance over Mask R-CNN, achieving 82.0% mAPmask50-95 for branch segmentation. The integrated system accurately localizes branches within a 2 m operational range, with processing times under one second per frame. These results establish the feasibility of cost-effective autonomous pruning systems that enhance worker safety and operational efficiency in commercial forestry.
手动修剪辐射松树由于极端的工作高度和复杂的地形,存在显著的安全风险。本文提出了一种计算机视觉框架,该框架结合了YOLO目标检测技术和半全局块匹配(SGBM)立体视觉技术,用于基于自主无人机的修剪作业。我们的系统仅通过使用双目摄像头输入实现了精确的枝条检测和深度估计,从而无需昂贵的激光雷达传感器。 实验评估表明,在枝条分割方面,YOLO的表现优于Mask R-CNN,达到82.0% mAPmask50-95。集成系统能够在一个2米的操作范围内准确定位枝条,并且处理时间每帧不到一秒。这些结果证明了低成本自主修剪系统的可行性,该系统可以提高工人的安全性和商业林业的运营效率。
https://arxiv.org/abs/2512.05412
Traditional stereo matching algorithms like Semi-Global Block Matching (SGBM) with Weighted Least Squares (WLS) filtering offer speed advantages over neural networks for UAV applications, generating disparity maps in approximately 0.5 seconds per frame. However, these algorithms require meticulous parameter tuning. We propose a Genetic Algorithm (GA) based parameter optimization framework that systematically searches for optimal parameter configurations for SGBM and WLS, enabling UAVs to measure distances to tree branches with enhanced precision while maintaining processing efficiency. Our contributions include: (1) a novel GA-based parameter optimization framework that eliminates manual tuning; (2) a comprehensive evaluation methodology using multiple image quality metrics; and (3) a practical solution for resource-constrained UAV systems. Experimental results demonstrate that our GA-optimized approach reduces Mean Squared Error by 42.86% while increasing Peak Signal-to-Noise Ratio and Structural Similarity by 8.47% and 28.52%, respectively, compared with baseline configurations. Furthermore, our approach demonstrates superior generalization performance across varied imaging conditions, which is critcal for real-world forestry applications.
传统的立体匹配算法,如半全局块匹配(Semi-Global Block Matching, SGBM)结合加权最小二乘滤波(Weighted Least Squares, WLS),在无人飞行器(UAV)应用中比神经网络具有速度优势,能够在大约0.5秒内生成每帧的视差图。然而,这些算法需要仔细调整参数。我们提出了一种基于遗传算法(Genetic Algorithm, GA)的参数优化框架,该框架系统地搜索SGBM和WLS的最佳参数配置,使UAV能够以更高的精度测量到树干的距离,同时保持处理效率。我们的贡献包括:(1) 一种新颖的GA基参数优化框架,消除了手动调优的需求;(2) 使用多个图像质量度量的全面评估方法;以及 (3) 针对资源受限UAV系统的实用解决方案。 实验结果表明,我们基于GA优化的方法与基准配置相比,在减少均方误差(Mean Squared Error)方面提高了42.86%,同时将峰值信噪比(Peak Signal-to-Noise Ratio)和结构相似性(Structural Similarity)分别提升了8.47% 和 28.52%。此外,我们的方法在不同的成像条件下表现出色的泛化性能,这对于现实世界的林业应用至关重要。
https://arxiv.org/abs/2512.05410
This work presents an event-triggered switching control framework for a class of nonlinear underactuated multi-channel systems with input constraints. These systems are inspired by cooperative manipulation tasks involving underactuation, where multiple underactuated agents collaboratively push or pull an object to a target pose. Unlike existing approaches for multi-channel systems, our method addresses underactuation and the potential loss of controllability by additionally addressing channel assignment of agents. To simultaneously account for channel assignment, input constraints, and stabilization, we formulate the control problem as a Mixed Integer Linear Programming and derive sufficient conditions for its feasibility. To improve real-time computation efficiency, we introduce an event-triggered control scheme that maintains stability even between switching events through a quadratic programming-based stabilizing controller. We theoretically establish the semi-global exponential stability of the proposed method and the asymptotic stability of its extension to nonprehensile cooperative manipulation under noninstantaneous switching. The proposed framework is further validated through numerical simulations on 2D and 3D free-flyer systems and multi-robot nonprehensile pushing tasks.
https://arxiv.org/abs/2511.22810
In this paper, an adaptive controller is designed for the synchronization of the trajectory of a robot with unknown kinematics and dynamics to that of the current human trajectory in the task space using the delayed human trajectory information. The communication time delay may be a result of various factors that arise in human-robot collaboration tasks, such as sensor processing or fusion to estimate trajectory/intent, network delays, or computational limitations. The developed adaptive controller uses Barrier Lyapunov Function (BLF) to constrain the Cartesian coordinates of the robot to ensure safety, an ICL-based adaptive law to account for the unknown kinematics, and a gradient-based adaptive law to estimate unknown dynamics. Barrier Lyapunov-Krasovskii (LK) functionals are used for the stability analysis to show that the synchronization and parameter estimation errors remain semi-globally uniformly ultimately bounded (SGUUB). The simulation results based on a human-robot synchronization scenario with time delay are provided to demonstrate the effectiveness of the designed synchronization controller with safety constraints.
在这篇论文中,设计了一种自适应控制器,用于在任务空间中使用延迟的人体轨迹信息同步机器人(其动力学和运动学未知)的轨迹与当前人体的轨迹。通信时延可能是由人类-机器人协作任务中的各种因素引起的,例如传感器处理或融合以估计轨迹/意图、网络延迟或计算限制等。 开发的自适应控制器利用障碍Lyapunov函数(BLF)来约束机器人的笛卡尔坐标,确保其安全性;基于迭代学习控制(ICL)的自适应律用于应对未知运动学问题;以及基于梯度的自适应律用于估计未知的动力学特性。使用Barrier Lyapunov-Krasovskii (LK)泛函进行稳定性分析,以表明同步误差和参数估计误差在半全局一致最终有界(SGUUB)范围内。 论文提供了具有时间延迟的人机同步场景下的仿真结果,展示了所设计的安全约束同步控制器的有效性。
https://arxiv.org/abs/2509.22976
In this work, we present the first stability results for approximate predictors in multi-input non-linear systems with distinct actuation delays. We show that if the predictor approximation satisfies a uniform (in time) error bound, semi-global practical stability is correspondingly achieved. For such approximators, the required uniform error bound depends on the desired region of attraction and the number of control inputs in the system. The result is achieved through transforming the delay into a transport PDE and conducting analysis on the coupled ODE-PDE cascade. To highlight the viability of such error bounds, we demonstrate our results on a class of approximators - neural operators - showcasing sufficiency for satisfying such a universal bound both theoretically and in simulation on a mobile robot experiment.
在这项工作中,我们首次提出了多输入非线性系统中具有不同执行器延迟的近似预测器的稳定性结果。我们展示了如果预测器的近似误差满足一个在时间上一致的界限,则相应的半全局实际稳定性可以实现。对于这样的逼近器,所需的一致误差界限取决于期望吸引域和系统中的控制输入数量。通过将延迟转换为传输偏微分方程(PDE)并分析耦合的常微分方程-偏微分方程级联,我们实现了这一结果。 为了突出这些误差界限的有效性,我们在一类逼近器——神经算子上展示了我们的研究成果,从理论上和在移动机器人实验中的模拟中证明了满足这种通用界限的充分条件。
https://arxiv.org/abs/2509.17131
Mars exploration requires precise and reliable terrain models to ensure safe rover navigation across its unpredictable and often hazardous landscapes. Stereoscopic vision serves a critical role in the rover's perception, allowing scene reconstruction by generating precise depth maps through stereo matching. State-of-the-art Martian planetary exploration uses traditional local block-matching, aggregates cost over square windows, and refines disparities via smoothness constraints. However, this method often struggles with low-texture images, occlusion, and repetitive patterns because it considers only limited neighbouring pixels and lacks a wider understanding of scene context. This paper uses Semi-Global Matching (SGM) with superpixel-based refinement to mitigate the inherent block artefacts and recover lost details. The approach balances the efficiency and accuracy of SGM and adds context-aware segmentation to support more coherent depth inference. The proposed method has been evaluated in three datasets with successful results: In a Mars analogue, the terrain maps obtained show improved structural consistency, particularly in sloped or occlusion-prone regions. Large gaps behind rocks, which are common in raw disparity outputs, are reduced, and surface details like small rocks and edges are captured more accurately. Another two datasets, evaluated to test the method's general robustness and adaptability, show more precise disparity maps and more consistent terrain models, better suited for the demands of autonomous navigation on Mars, and competitive accuracy across both non-occluded and full-image error metrics. This paper outlines the entire terrain modelling process, from finding corresponding features to generating the final 2D navigation maps, offering a complete pipeline suitable for integration in future planetary exploration missions.
火星探索需要精确和可靠的地形模型,以确保探测车在不可预测且常有危险的地形中安全导航。立体视觉在探测车感知中扮演关键角色,通过立体匹配生成精确深度图来实现场景重建。目前最先进的火星行星探索方法采用传统的局部块匹配法,在方形窗口内聚合成本,并通过平滑度约束优化视差。然而,这种传统方法经常难以处理低纹理图像、遮挡和重复模式问题,因为它仅考虑有限的相邻像素且缺乏对更广泛场景上下文的理解。 本文提出使用基于超像素细化的半全局匹配(SGM)来减轻固有的块状伪影,并恢复丢失的细节。该方法在保持SGM效率的同时提高了精度,并增加了感知上下文分割以支持更为连贯的深度推断。所提出的这种方法已在三个数据集中进行了评估并取得了成功结果:在火星类比地形中,获得的地形图显示了改进的结构一致性,特别是在斜坡或易受遮挡区域。岩石后面的较大间隙——这是原始视差输出中的常见问题——已大大减少,并且地表细节如小石块和边缘也得到了更准确的捕捉。 另外两个用于测试该方法鲁棒性和适应性的数据集显示了更为精确的视差图和一致地形模型,更适合火星上自主导航的需求,在无遮挡以及全图误差度量方面具有竞争力的精度。 本文概述了整个地形建模过程,从查找对应的特征到生成最终2D导航地图,提供了一个完整的流程,适用于未来行星探索任务中的集成。
https://arxiv.org/abs/2509.05645
Deformable image registration (DIR) is a crucial and challenging technique for aligning anatomical structures in medical images and is widely applied in diverse clinical applications. However, existing approaches often struggle to capture fine-grained local deformations and large-scale global deformations simultaneously within a unified framework. We present FractMorph, a novel 3D dual-parallel transformer-based architecture that enhances cross-image feature matching through multi-domain fractional Fourier transform (FrFT) branches. Each Fractional Cross-Attention (FCA) block applies parallel FrFTs at fractional angles of 0°, 45°, 90°, along with a log-magnitude branch, to effectively extract local, semi-global, and global features at the same time. These features are fused via cross-attention between the fixed and moving image streams. A lightweight U-Net style network then predicts a dense deformation field from the transformer-enriched features. On the ACDC cardiac MRI dataset, FractMorph achieves state-of-the-art performance with an overall Dice Similarity Coefficient (DSC) of 86.45%, an average per-structure DSC of 75.15%, and a 95th-percentile Hausdorff distance (HD95) of 1.54 mm on our data split. We also introduce FractMorph-Light, a lightweight variant of our model with only 29.6M parameters, which maintains the superior accuracy of the main model while using approximately half the memory. Our results demonstrate that multi-domain spectral-spatial attention in transformers can robustly and efficiently model complex non-rigid deformations in medical images using a single end-to-end network, without the need for scenario-specific tuning or hierarchical multi-scale networks. The source code of our implementation is available at this https URL.
可变形图像配准(DIR)是医学影像中对齐解剖结构的一项关键且具有挑战性的技术,广泛应用于多种临床应用。然而,现有的方法往往难以在统一框架内同时捕捉精细的局部变形和大规模的整体变形。我们提出了一种新型的3D双平行变压器架构——FractMorph,该架构通过多域分数傅立叶变换(FrFT)分支增强了跨图像特征匹配。每个分数交叉注意力(FCA)模块以0°、45°、90°等不同角度同时应用并行的分数傅立叶变换,并辅以对数幅度支路,从而能够有效地同时提取局部、半全局和整体特征。这些特征通过固定图像流与移动图像流之间的交叉注意机制进行融合。随后,一个轻量级U-Net风格网络从变压器增强后的特征中预测密集变形场。 在ACDC心脏MRI数据集上,FractMorph实现了最先进的性能,总体Dice相似系数(DSC)为86.45%,平均结构DSC为75.15%,95%的豪斯多夫距离(HD95)为1.54毫米。我们还引入了FractMorph-Light,这是我们的模型的一种轻量级变体,参数仅为29.6百万,在使用大约一半内存的同时保持了主要模型的优异精度。 我们的结果表明,变压器中的多域频谱空间注意能够稳健且高效地建模医学图像中的复杂非刚性变形,并可通过单个端到端网络实现这一目标,无需针对特定场景进行调整或采用分层多尺度网络。我们实施的源代码可在提供的链接中获得。
https://arxiv.org/abs/2508.12445
Deep learning dominates image classification tasks, yet understanding how models arrive at predictions remains a challenge. Much research focuses on local explanations of individual predictions, such as saliency maps, which visualise the influence of specific pixels on a model's prediction. However, reviewing many of these explanations to identify recurring patterns is infeasible, while global methods often oversimplify and miss important local behaviours. To address this, we propose Segment Attribution Tables (SATs), a method for summarising local saliency explanations into (semi-)global insights. SATs take image segments (such as "eyes" in Chihuahuas) and leverage saliency maps to quantify their influence. These segments highlight concepts the model relies on across instances and reveal spurious correlations, such as reliance on backgrounds or watermarks, even when out-of-distribution test performance sees little change. SATs can explain any classifier for which a form of saliency map can be produced, using segmentation maps that provide named segments. SATs bridge the gap between oversimplified global summaries and overly detailed local explanations, offering a practical tool for analysing and debugging image classifiers.
深度学习在图像分类任务中占据主导地位,但理解模型如何得出预测结果仍然是一项挑战。许多研究集中在个别预测的局部解释上,例如利用敏感性地图(saliency maps)来可视化特定像素对模型预测的影响。然而,通过审查大量这些解释以识别重复模式是不可行的,而全局方法往往过于简化,并且会错过重要的局部行为。为了解决这一问题,我们提出了片段归属表(Segment Attribution Tables, SATs),这是一种将局部敏感性解释汇总成(半)全局见解的方法。 SATs 使用图像片段(例如“眼睛”在吉娃娃犬中的角色)并利用敏感性地图来量化其对模型预测的影响。这些片段突出了模型跨实例依赖的概念,并揭示了虚假相关性,如背景或水印的依赖,即使是在分布外测试性能几乎没有变化的情况下也是如此。 SATs 可以解释任何可以生成某种形式敏感性图的分类器,使用提供命名片段的分割图。SATs 在过于简化的全局摘要和过于详细的局部解释之间架起了桥梁,为分析和调试图像分类器提供了实用工具。
https://arxiv.org/abs/2506.23247
In human-in-the-loop systems such as teleoperation, especially those involving heavy-duty manipulators, achieving high task performance requires both robust control and strong human engagement. This paper presents a bilateral teleoperation framework that enhances the operator's Sense of Embodiment (SoE), specifically, the senses of agency and self-location, through an immersive virtual reality interface and distributed haptic feedback via an exoskeleton. To support this embodiment and stablish high level of motion and force transparency, we develop a force-sensorless, robust control architecture that tackles input nonlinearities, master-slave asymmetries, unknown uncertainties, and arbitrary time delays. A human-robot augmented dynamic model is integrated into the control loop to enhance human-adaptability of the controller. Theoretical analysis confirms semi-global uniform ultimate boundedness of the closed-loop system. Extensive real-world experiments demonstrate high accuracy tracking under up to 1:13 motion scaling and 1:1000 force scaling, showcasing the significance of the results. Additionally, the stability-transparency tradeoff for motion tracking and force reflection-tracking is establish up to 150 ms of one-way fix and time-varying communication delay. The results of user study with 10 participants (9 male and 1 female) demonstrated that the system can imply a good level of SoE (76.4%), at the same time is very user friendly with no gender limitation. These results are significant given the scale and weight of the heavy-duty manipulators.
在涉及重型操作臂的人机闭环系统(如遥操作系统)中,实现高任务性能不仅需要坚固的控制技术,还需要强大的人类参与。本文提出了一种双边遥操作系统框架,通过沉浸式虚拟现实界面和外骨骼提供的分布式触觉反馈,增强了操作员的身体感(Sense of Embodiment, SoE),特别是代理意识和自我定位感。 为了支持这种身体化并建立高水平的动作和力透明度,我们开发了一个无需力传感器的坚固控制架构。该架构处理输入非线性、主从不对称性、未知不确定性以及任意时延问题。我们将人机增强动态模型整合到控制回路中,以提高控制器的人类适应性。理论分析证实了闭环系统的半全局一致最终有界性。 大量现实世界实验展示了在1:13动作缩放和1:1000力缩放条件下实现了高精度跟踪,突显了研究结果的重要性。此外,在单向固定延迟(长达150毫秒)以及变化时间延迟的情况下,我们建立了运动跟踪与力反馈跟踪之间的稳定性和透明性权衡。 一项涉及10名参与者(9名男性和1名女性)的用户研究表明,该系统能够实现良好的SoE水平(76.4%),同时易于使用且没有性别限制。考虑到重型操作臂的规模和重量,这些结果具有重要意义。
https://arxiv.org/abs/2505.14486
We present a real-time, non-learning depth estimation method that fuses Light Detection and Ranging (LiDAR) data with stereo camera input. Our approach comprises three key techniques: Semi-Global Matching (SGM) stereo with Discrete Disparity-matching Cost (DDC), semidensification of LiDAR disparity, and a consistency check that combines stereo images and LiDAR data. Each of these components is designed for parallelization on a GPU to realize real-time performance. When it was evaluated on the KITTI dataset, the proposed method achieved an error rate of 2.79\%, outperforming the previous state-of-the-art real-time stereo-LiDAR fusion method, which had an error rate of 3.05\%. Furthermore, we tested the proposed method in various scenarios, including different LiDAR point densities, varying weather conditions, and indoor environments, to demonstrate its high adaptability. We believe that the real-time and non-learning nature of our method makes it highly practical for applications in robotics and automation.
我们提出了一种实时、无需学习的深度估计方法,该方法融合了激光雷达(LiDAR)数据与立体相机输入。我们的方法包含三种关键技术:带有离散视差匹配代价(DDC)的半全局匹配(SGM)立体视觉、激光雷达视差半密集化以及结合立体图像和激光雷达数据的一致性检查。每个组件都设计为可以在GPU上并行化以实现实时性能。在KITTI数据集上的评估显示,所提出的方法达到了2.79%的误差率,优于之前的最先进的实时立体-LiDAR融合方法,其误差率为3.05%。此外,我们还在各种场景中测试了该方法,包括不同激光雷达点密度、变化天气条件和室内环境,以展示其高度适应性。我们认为,我们的方法具有实时性和无需学习的特性,使其在机器人技术和自动化应用中非常实用。
https://arxiv.org/abs/2504.05148
Manual pruning of radiata pine trees presents significant safety risks due to their substantial height and the challenging terrains in which they thrive. To address these risks, this research proposes the development of a drone-based pruning system equipped with specialized pruning tools and a stereo vision camera, enabling precise detection and trimming of branches. Deep learning algorithms, including YOLO and Mask R-CNN, are employed to ensure accurate branch detection, while the Semi-Global Matching algorithm is integrated to provide reliable distance estimation. The synergy between these techniques facilitates the precise identification of branch locations and enables efficient, targeted pruning. Experimental results demonstrate that the combined implementation of YOLO and SGBM enables the drone to accurately detect branches and measure their distances from the drone. This research not only improves the safety and efficiency of pruning operations but also makes a significant contribution to the advancement of drone technology in the automation of agricultural and forestry practices, laying a foundational framework for further innovations in environmental management.
手动剪枝松树树存在显著的安全风险,因为它们的高度巨大,生长环境具有挑战性。为解决这些风险,这项研究提出了一个采用无人机和专用剪枝工具以及立体视觉摄像头的剪枝系统,实现对树枝的准确检测和修剪。深度学习算法,包括YOLO和Mask R-CNN,用于确保准确分支检测,而半全局匹配算法被集成以提供可靠的距离估计。这些技术之间的协同作用有助于精确确定分支位置,实现有针对性的剪枝。实验结果表明,YOLO和SGBM的联合应用使无人机能够准确检测树枝并测量其距离,从而提高了修剪操作的安全性和效率。这项研究不仅提高了修剪操作的安全性和效率,还对无人机在农业和林业实践中的自动化发展做出了重要贡献,为环境管理中进一步的创新奠定了基础。
https://arxiv.org/abs/2409.17526
Heavy-duty operations, typically performed using heavy-duty hydraulic manipulators (HHMs), are susceptible to environmental contact due to tracking errors or sudden environmental changes. Therefore, beyond precise control design, it is crucial that the manipulator be resilient to potential impacts without relying on contact-force sensors, which mostly cannot be utilized. This paper proposes a novel force-sensorless robust impact-resilient controller for a generic 6-degree-of-freedom (DoF) HHM constituting from anthropomorphic arm and spherical wrist mechanisms. The scheme consists of a neuroadaptive subsystem-based impedance controller, which is designed to ensure both accurate tracking of position and orientation with stabilization of HHMs upon contact, along with a novel generalized momentum observer, which is for the first time introduced in Plücker coordinate, to estimate the impact force. Finally, by leveraging the concepts of virtual stability and virtual power flow, the semi-global uniformly ultimately boundedness of the entire system is assured. To demonstrate the efficacy and versatility of the proposed method, extensive experiments were conducted using a generic 6-DoF industrial HHM. The experimental results confirm the exceptional performance of the designed method by achieving a subcentimeter tracking accuracy and by 80% reduction of impact of the contact.
重载操作通常使用重型液压操纵器(HHMs)进行,由于跟踪误差或突然环境变化,容易受到环境接触。因此,在精确控制设计的基础上,确保无接触力传感器,这是主要无法利用的,是至关重要的。本文提出了一种新颖的力传感器less的稳健碰撞 resilient控制器,用于构成具有人形手臂和球形手腕机制的6度自由度(DOF)HHM。方案包括一个基于神经适应子系统的不稳定阻抗控制器,用于确保接触时HHMs的精确位置和姿态跟踪,以及一个新颖的泛克氏坐标系中首次引入的新通用动量观察器,用于估计冲击力。最后,通过利用虚稳定和虚功率流的概念,确保了整个系统的半全局唯一界。为了验证所提出方法的有效性和多样性,使用通用6度自由度工业HHM进行了广泛的实验。实验结果证实了设计方法出色的性能和多样性,通过实现亚毫米跟踪精度和接触冲击的80%减少,证实了设计方法非常有效。
https://arxiv.org/abs/2408.09147
Crowd counting is gaining societal relevance, particularly in domains of Urban Planning, Crowd Management, and Public Safety. This paper introduces Fourier-guided attention (FGA), a novel attention mechanism for crowd count estimation designed to address the inefficient full-scale global pattern capture in existing works on convolution-based attention networks. FGA efficiently captures multi-scale information, including full-scale global patterns, by utilizing Fast-Fourier Transformations (FFT) along with spatial attention for global features and convolutions with channel-wise attention for semi-global and local features. The architecture of FGA involves a dual-path approach: (1) a path for processing full-scale global features through FFT, allowing for efficient extraction of information in the frequency domain, and (2) a path for processing remaining feature maps for semi-global and local features using traditional convolutions and channel-wise attention. This dual-path architecture enables FGA to seamlessly integrate frequency and spatial information, enhancing its ability to capture diverse crowd patterns. We apply FGA in the last layers of two popular crowd-counting works, CSRNet and CANNet, to evaluate the module's performance on benchmark datasets such as ShanghaiTech-A, ShanghaiTech-B, UCF-CC-50, and JHU++ crowd. The experiments demonstrate a notable improvement across all datasets based on Mean-Squared-Error (MSE) and Mean-Absolute-Error (MAE) metrics, showing comparable performance to recent state-of-the-art methods. Additionally, we illustrate the interpretability using qualitative analysis, leveraging Grad-CAM heatmaps, to show the effectiveness of FGA in capturing crowd patterns.
人群计数在社会意义上有越来越多的重要性,特别是在城市规划、人群管理和公共安全等领域。本文介绍了一种名为Fourier引导关注(FGA)的新 crowd count 估计注意力机制,旨在解决现有基于卷积基注意网络的工作中的全局模式捕捉效率低下的问题。FGA 通过利用 Fast-Fourier Transforms(FFT)和空间注意力来捕捉多尺度信息,包括全局模式,从而有效地提取频率域中的信息。此外,FGA 使用传统的卷积和通道级注意力来处理剩余的半全局和局部特征。这种双路径架构使得FGA 可以轻松整合频率和空间信息,提高其捕捉多样 crowd 模式的能力。我们在两个流行的 crowd-counting 作品中应用 FGA,分别是 CSRNet 和 CANNet,以评估其在基准数据集(如上海Tech-A、上海Tech-B、UCF-CC-50 和 JHU+)上的性能。实验结果表明,所有数据集基于平均平方误差(MSE)和平均绝对误差(MAE)的评估结果都有显著的改进,显示出与最近最先进的方法的性能相当。此外,我们通过使用 Grad-CAM 热图进行定性分析,展示了 FGA 在捕捉 crowd patterns 中的有效性。
https://arxiv.org/abs/2407.06110
Palm oil production has been identified as one of the major drivers of deforestation for tropical countries. To meet supply chain objectives, commodity producers and other stakeholders need timely information of land cover dynamics in their supply shed. However, such data are difficult to obtain from suppliers who may lack digital geographic representations of their supply sheds and production locations. Here we present a "community model," a machine learning model trained on pooled data sourced from many different stakeholders, to develop a specific land cover probability map, in this case a semi-global oil palm map. An advantage of this method is the inclusion of varied inputs, the ability to easily update the model as new training data becomes available and run the model on any year that input imagery is available. Inclusion of diverse data sources into one probability map can help establish a shared understanding across stakeholders on the presence and absence of a land cover or commodity (in this case oil palm). The model predictors are annual composites built from publicly available satellite imagery provided by Sentinel-1, Sentinel-2, and ALOS DSM. We provide map outputs as the probability of palm in a given pixel, to reflect the uncertainty of the underlying state (palm or not palm). The initial version of this model provides global accuracy estimated to be approximately 90% (at 0.5 probability threshold) from spatially partitioned test data. This model, and resulting oil palm probability map products are useful for accurately identifying the geographic footprint of palm cultivation. Used in conjunction with timely deforestation information, this palm model is useful for understanding the risk of continued oil palm plantation expansion in sensitive forest areas.
棕榈油生产被认为是热带国家森林砍伐的主要驱动因素之一。为了满足供应链目标,商品生产商和其他利益相关方需要及时了解他们在供应链中的土地覆盖动态。然而,从可能缺乏数字地理表示的供应商处获得这种数据是非常困难的。在这里,我们提出了一个“社区模型”,一种基于许多不同利益相关者共同提供的数据集的机器学习模型,以开发特定的土地覆盖概率图,例如半全球油棕地图。这种方法的优势包括纳入各种输入数据、能够轻松更新模型以获取新训练数据并运行模型以处理任何可用的输入图像的能力。将多样数据来源集成到一个概率图中可以帮助利益相关者之间建立对土地覆盖或商品(例如油棕)存在和缺失的共同理解。 模型预测器是由来自Sentinel-1、Sentinel-2和ALOS DSM等公共卫星影像的年度复合构建的。我们提供每个像素的概率棕榈油,以反映底层状态的不确定性(棕榈或不是棕榈)。这个模型的最初版本估计全球准确度约为90%(在0.5概率阈值下)。这个模型及其油棕概率图产品对于准确识别棕榈油种植的地理足迹非常有用。与及时的森林砍伐信息相结合,这个棕榈油模型对于理解敏感森林区域中持续油棕种植扩展的风险非常有用。
https://arxiv.org/abs/2405.09530
In the reconstruction of façade elements, the identification of specific object types remains challenging and is often circumvented by rectangularity assumptions or the use of bounding boxes. We propose a new approach for the reconstruction of 3D façade details. We combine MLS point clouds and a pre-defined 3D model library using a BoW concept, which we augment by incorporating semi-global features. We conduct experiments on the models superimposed with random noise and on the TUM-FAÇADE dataset. Our method demonstrates promising results, improving the conventional BoW approach. It holds the potential to be utilized for more realistic facade reconstruction without rectangularity assumptions, which can be used in applications such as testing automated driving functions or estimating façade solar potential.
在外墙元素的重构中,确定具体物体类型仍然具有挑战性,并且通常通过矩形性假设或使用边界框来规避。我们提出了一种新的外墙细节重构方法。我们通过结合MLS点云和预定义的3D模型库,使用BW概念进行增强,并引入半全局特征。我们在包含随机噪声的外模型的实验和TUM-FAÇADE数据集上进行了实验。我们的方法取得了很好的结果,改善了传统的BW方法。它具有在不考虑矩形性假设的情况下进行更真实外墙重构的潜力,可以应用于自动驾驶功能的测试或估计外墙太阳能潜力等应用。
https://arxiv.org/abs/2402.06521
Vast industrial investment along with increased academic research on hydraulic heavy-duty manipulators has unavoidably paved the way for their automatization, necessitating the design of robust and high-precision controllers. In this study, an orchestrated robust controller is designed to address the mentioned issue. To do so, the entire robotic system is decomposed into subsystems, and a robust controller is designed at each local subsystem by considering unknown model uncertainties, unknown disturbances, and compound input constraints, thanks to virtual decomposition control (VDC). As such, radial basic function neural networks (RBFNNs) are incorporated into VDC to tackle unknown disturbances and uncertainties, resulting in novel decentralized RBFNNs. All these robust local controllers designed at each local subsystem are, then, orchestrated to accomplish high-precision control. In the end, for the first time in the context of VDC, a semi-globally uniformly ultimate boundedness is achieved under the designed controller. The validity of the theoretical results is verified by performing extensive simulations and experiments on a 6-degrees-of-freedom industrial manipulator with a nominal lifting capacity of $600\, kg$ at $5$ meters reach. Comparing the simulation result to state-of-the-art controller along with provided experimental results, demonstrates that the proposed method established all the promises and performed excellently.
巨大的工业投资和液压重型操纵器增加的学术研究不可避免地为它们的自动化开辟了道路,迫使设计具有稳健和高精度的控制器。在这项研究中,设计了一个有组织的稳健控制器来解决上述问题。为了做到这一点,整个机器人系统被分解为子系统,然后通过虚拟分解控制(VDC)在每个局部子系统中设计一个稳健的控制器,考虑未知模型不确定性、未知干扰和复合输入约束。因此,径向基本功能神经网络(RBFNNs)被引入到VDC中解决未知干扰和不确定性,从而实现新颖的分布式RBFNN。所有在局部子系统设计的稳健控制器都被编排成实现高精度控制。最后,在VDC的背景下,实现了半全局均匀终极有界性。通过在具有名义提升能力为600公斤的6度自由度工业吊车上的广泛仿真和实验来验证理论结果,证明了所提出的方法取得了巨大的成功。
https://arxiv.org/abs/2312.06304
In Ultrasound Localization Microscopy (ULM), achieving high-resolution images relies on the precise localization of contrast agent particles across consecutive beamformed frames. However, our study uncovers an enormous potential: The process of delay-and-sum beamforming leads to an irreversible reduction of Radio-Frequency (RF) data, while its implications for localization remain largely unexplored. The rich contextual information embedded within RF wavefronts, including their hyperbolic shape and phase, offers great promise for guiding Deep Neural Networks (DNNs) in challenging localization scenarios. To fully exploit this data, we propose to directly localize scatterers in RF signals. Our approach involves a custom super-resolution DNN using learned feature channel shuffling and a novel semi-global convolutional sampling block tailored for reliable and accurate localization in RF input data. Additionally, we introduce a geometric point transformation that facilitates seamless mapping between B-mode and RF spaces. To validate the effectiveness of our method and understand the impact of beamforming, we conduct an extensive comparison with State-Of-The-Art (SOTA) techniques in ULM. We present the inaugural in vivo results from an RF-trained DNN, highlighting its real-world practicality. Our findings show that RF-ULM bridges the domain gap between synthetic and real datasets, offering a considerable advantage in terms of precision and complexity. To enable the broader research community to benefit from our findings, our code and the associated SOTA methods are made available at this https URL.
在超声波定位显微镜(ULM)中,获得高分辨率图像依赖于连续波形帧中均匀定位 contrast agent 颗粒的位置。然而,我们的研究揭示了一个巨大的潜力:延迟和相加的波形编码过程会导致射频数据 irreversible 减少,而对其定位的影响则在很大程度上未被探索。射频前端中的丰富上下文信息,包括其椭圆形状和相位,为在挑战性的定位场景中引导深度神经网络(DNN)提供了巨大的潜力。为了充分利用这些数据,我们提议直接定位射频信号中的散射剂。我们的方案涉及使用学习的特征通道交换自定义的高性能分辨率 DNN 和一个专门设计的全新半全局卷积采样块,以在射频输入数据中可靠且准确地定位。此外,我们引入了一种几何点变换,以方便在 B 模式和射频空间之间的无缝映射。为了验证我们的方法的有效性并理解波形编码的影响,我们进行了广泛的比较 ULM 中的最新技术。我们呈现了从射频训练的 DNN 中提取的最初的 vivo 结果,强调了其实际实用性。我们的研究结果表明,RF-ULM 跨越了合成数据和真实数据数据集之间的领域差距,在精度和复杂性方面提供了相当大的优势。为了让更多的人受益于我们的研究结果,我们的代码和相关的最新技术方法在此 https URL 上提供。
https://arxiv.org/abs/2310.01545
Digital surface model generation using traditional multi-view stereo matching (MVS) performs poorly over non-Lambertian surfaces, with asynchronous acquisitions, or at discontinuities. Neural radiance fields (NeRF) offer a new paradigm for reconstructing surface geometries using continuous volumetric representation. NeRF is self-supervised, does not require ground truth geometry for training, and provides an elegant way to include in its representation physical parameters about the scene, thus potentially remedying the challenging scenarios where MVS fails. However, NeRF and its variants require many views to produce convincing scene's geometries which in earth observation satellite imaging is rare. In this paper we present SparseSat-NeRF (SpS-NeRF) - an extension of Sat-NeRF adapted to sparse satellite views. SpS-NeRF employs dense depth supervision guided by crosscorrelation similarity metric provided by traditional semi-global MVS matching. We demonstrate the effectiveness of our approach on stereo and tri-stereo Pleiades 1B/WorldView-3 images, and compare against NeRF and Sat-NeRF. The code is available at this https URL
使用传统的多视图三角测量(MVS)方法生成数字表面模型在非Lambertian表面、 asynchronous acquisition 或中断处表现不佳。神经光辐射场(NeRF)提供了使用连续体积表示重构表面几何学的新范式。 NeRF是自我监督的,不需要 ground truth 几何学进行训练,并提供了一种优雅的方式,将其表示的物理参数包括在它的表示中,从而可能修复在 MVS 失败时的 challenging 场景。然而, NeRF 及其变体需要许多视图才能产生有说服力的场景几何学,这在地球观测卫星图像中是非常罕见的。在本文中,我们介绍了稀疏卫星-NeRF(SpS-NeRF) - Sat-NeRF 的扩展,以适应稀疏卫星视图。 SpS-NeRF采用Dense depth supervision,受传统半全局MVS匹配提供的交叉相关相似度度量的指导。我们证明了我们的方法在三角和三角多视图 Pleiades 1B/WorldView-3 图像中的 effectiveness,并对比了 NeRF 和Sat-NeRF。代码在此https URL 可用。
https://arxiv.org/abs/2309.00277
This paper proposes a nonlinear stochastic complementary filter design for inertial navigation that takes advantage of a fusion of Ultra-wideband (UWB) and Inertial Measurement Unit (IMU) technology ensuring semi-global uniform ultimate boundedness (SGUUB) of the closed loop error signals in mean square. The proposed filter estimates the vehicle's orientation, position, linear velocity, and noise covariance. The filter is designed to mimic the nonlinear navigation motion kinematics and is posed on a matrix Lie Group, the extended form of the Special Euclidean Group $\mathbb{SE}_{2}\left(3\right)$. The Lie Group based structure of the proposed filter provides unique and global representation avoiding singularity (a common shortcoming of Euler angles) as well as non-uniqueness (a common limitation of unit-quaternion). Unlike Kalman-type filters, the proposed filter successfully addresses IMU measurement noise considering unknown upper-bounded covariance. Although the navigation estimator is proposed in a continuous form, the discrete version is also presented. Moreover, the unit-quaternion implementation has been provided in the Appendix. Experimental validation performed using a publicly available real-world six-degrees-of-freedom (6 DoF) flight dataset obtained from an unmanned Micro Aerial Vehicle (MAV) illustrating the robustness of the proposed navigation technique. Keywords: Sensor-fusion, Inertial navigation, Ultra-wideband ranging, Inertial measurement unit, Stochastic differential equation, Stability, Localization, Observer design.
本文提出了一种非线性随机互补滤波设计,用于惯性导航,利用 Ultra-wideband (UWB) 和惯性测量单元 (IMU) 技术,确保半全局均匀的终极限制(SGUUB)的闭环误差信号平方meanSquare的平均值。该滤波估计了车辆的方向、位置、线性速度和噪声责任。该滤波设计模拟非线性导航运动学,并将其放置在矩阵Lie Group,即特别欧几里得组$\mathbb{SE}_{2}\left(3\right)的扩展形式。该滤波基于Lie Group的结构提供了独特的和全球表示,以避免 singularities( Euler 角度的常见缺点)和 non-uniqueness(单位元quaternion的常见限制)。与Kalman类型的滤波不同,该滤波成功地解决了IMU测量噪声,考虑未知的upper-bounded责任密度。虽然导航估计器是连续形式的,但离散形式也呈现。此外,单位元quaternion实现已提供附录。使用从无人Micro Aerial Vehicle (MAV) 获得的公开可用的现实世界六自由度(6 DoF)飞行数据集,通过实验验证展示了该导航技术的可靠性。关键词:传感器融合、惯性导航、UWB超宽带测量、惯性测量单元、随机微分方程、稳定性、定位、观测设计。
https://arxiv.org/abs/2308.13393
Time of Flight (ToF) is a prevalent depth sensing technology in the fields of robotics, medical imaging, and non-destructive testing. Yet, ToF sensing faces challenges from complex ambient conditions making an inverse modelling from the sparse temporal information intractable. This paper highlights the potential of modern super-resolution techniques to learn varying surroundings for a reliable and accurate ToF detection. Unlike existing models, we tailor an architecture for sub-sample precise semi-global signal localization by combining super-resolution with an efficient residual contraction block to balance between fine signal details and large scale contextual information. We consolidate research on ToF by conducting a benchmark comparison against six state-of-the-art methods for which we employ two publicly available datasets. This includes the release of our SToF-Chirp dataset captured by an airborne ultrasound transducer. Results showcase the superior performance of our proposed StofNet in terms of precision, reliability and model complexity. Our code is available at this https URL.
时间反演(ToF)是机器人、医学成像和非破坏性测试等领域中广泛应用的深部探测技术。然而,ToF感知面临复杂的环境挑战,使得传统的逆模型难以实现。这篇论文强调了现代超分辨率技术学习不同环境范围的潜力,以进行可靠和准确的ToF检测。与现有模型不同,我们设计了一个 sub-样本精确半全局信号定位架构,通过结合超分辨率和高效的剩余收缩块,平衡了精细信号细节和大规模上下文信息。我们通过与六个最先进的方法进行基准比较,巩固了ToF研究,其中包括由空中超声波传感器捕获的我们的SToF-Chirp dataset的发布。结果展示了我们提出的StofNet在精度、可靠性和模型复杂性方面的优越性能。我们的代码可在该httpsURL上可用。
https://arxiv.org/abs/2308.12009