Ethological research increasingly benefits from the growing affordability and accessibility of drones, which enable the capture of high-resolution footage of animal movement at fine spatial and temporal scales. However, analyzing such footage presents the technical challenge of separating animal movement from drone motion. While non-trivial, computer vision techniques such as image registration and Structure-from-Motion (SfM) offer practical solutions. For conservationists, open-source tools that are user-friendly, require minimal setup, and deliver timely results are especially valuable for efficient data interpretation. This study evaluates three approaches: a bioimaging-based registration technique, an SfM pipeline, and a hybrid interpolation method. We apply these to a recorded escape event involving 44 plains zebras, captured in a single drone video. Using the best-performing method, we extract individual trajectories and identify key behavioral patterns: increased alignment (polarization) during escape, a brief widening of spacing just before stopping, and tighter coordination near the group's center. These insights highlight the method's effectiveness and its potential to scale to larger datasets, contributing to broader investigations of collective animal behavior.
行为学研究越来越多地受益于无人机的日益普及和成本降低,这些无人机能够以高分辨率捕捉动物在精细空间和时间尺度上的运动画面。然而,分析这类视频资料提出了一个技术挑战,即如何从拍摄素材中分离出动物本身的运动与无人机的移动。尽管这一过程具有一定的复杂性,但计算机视觉技术如图像配准(image registration)和基于运动的结构重建(Structure-from-Motion, SfM)提供了实用的解决方案。对于保护主义者而言,用户友好、设置简单且能够迅速产出结果的开源工具特别有价值,这些工具可以实现数据的有效解读。 本研究评估了三种方法:一种基于生物成像的配准技术、一个SfM流程和一种混合插值法,并将它们应用于一段记录下的44匹平原斑马逃逸事件的无人机视频中。利用表现最佳的方法,我们提取出个体的轨迹并识别关键的行为模式:在逃跑期间增加了方向一致性(极化),在停止前的一小段时间内间距有所增大,以及群体中心区域内的更紧密协调。 这些见解不仅展示了该方法的有效性,并且预示着它能够应用于更大的数据集,有助于集体动物行为研究中的进一步探索。
https://arxiv.org/abs/2505.16882
Unmanned Aerial Vehicles (UAVs) are evolving into language-interactive platforms, enabling more intuitive forms of human-drone interaction. While prior works have primarily focused on high-level planning and long-horizon navigation, we shift attention to language-guided fine-grained trajectory control, where UAVs execute short-range, reactive flight behaviors in response to language instructions. We formalize this problem as the Flying-on-a-Word (Flow) task and introduce UAV imitation learning as an effective approach. In this framework, UAVs learn fine-grained control policies by mimicking expert pilot trajectories paired with atomic language instructions. To support this paradigm, we present UAV-Flow, the first real-world benchmark for language-conditioned, fine-grained UAV control. It includes a task formulation, a large-scale dataset collected in diverse environments, a deployable control framework, and a simulation suite for systematic evaluation. Our design enables UAVs to closely imitate the precise, expert-level flight trajectories of human pilots and supports direct deployment without sim-to-real gap. We conduct extensive experiments on UAV-Flow, benchmarking VLN and VLA paradigms. Results show that VLA models are superior to VLN baselines and highlight the critical role of spatial grounding in the fine-grained Flow setting.
无人驾驶飞机(UAVs)正在演变为能够进行语言交互的平台,使人类与无人机之间的互动更加直观。尽管之前的研究主要集中在高级规划和长期导航上,我们则将注意力转向了通过语言指导实现细致轨迹控制的问题,即无人机根据语言指令执行短距离、反应式的飞行行为。我们将这个问题形式化为“按词飞行”(Flying-on-a-Word, Flow)任务,并引入UAV模仿学习作为一种有效的方法。在这个框架中,无人机通过模仿与原子级语言指令配对的专家飞行员轨迹来学习细致控制策略。 为了支持这一范式,我们推出了UAV-Flow,这是首个针对基于语言条件下的细致无人机控制的真实世界基准测试。它包括任务规范、在多种环境中收集的大规模数据集、可部署的控制系统以及用于系统评估的模拟套件。我们的设计使得无人机能够密切模仿人类飞行员精确的专业飞行轨迹,并支持直接部署而无需仿真到现实的差距。 我们在UAV-Flow上进行了广泛的实验,对视觉语言导航(VLN)和视觉语言动作(VLA)范式的性能进行了基准测试。结果显示,VLA模型优于VLN基线模型,并强调了空间定位在细致的Flow设置中的关键作用。
https://arxiv.org/abs/2505.15725
Drones have become essential in various applications, but conventional quadrotors face limitations in confined spaces and complex tasks. Deformable drones, which can adapt their shape in real-time, offer a promising solution to overcome these challenges, while also enhancing maneuverability and enabling novel tasks like object grasping. This paper presents a novel approach to autonomous motion planning and control for deformable quadrotors. We introduce a shape-adaptive trajectory planner that incorporates deformation dynamics into path generation, using a scalable kinodynamic A* search to handle deformation parameters in complex environments. The backend spatio-temporal optimization is capable of generating optimally smooth trajectories that incorporate shape deformation. Additionally, we propose an enhanced control strategy that compensates for external forces and torque disturbances, achieving a 37.3\% reduction in trajectory tracking error compared to our previous work. Our approach is validated through simulations and real-world experiments, demonstrating its effectiveness in narrow-gap traversal and multi-modal deformable tasks.
无人机在各种应用中已成为不可或缺的一部分,但传统的四旋翼无人机在狭小空间和复杂任务中面临局限性。可变形的无人机能够实时调整自身形状,为克服这些挑战提供了有前景的解决方案,并且还能增强机动性和执行像物体抓取这样的新型任务。本文提出了一种用于可变形四旋翼无人机自主运动规划与控制的新方法。我们引入了一个形变自适应轨迹规划器,该规划器将变形动力学纳入路径生成中,使用可扩展的动力学A*搜索算法来处理复杂环境中的变形参数。后端的空间-时间优化能够生成包含形状变化的最优平滑轨迹。此外,我们还提出了一种增强型控制策略,以补偿外部力和扭矩扰动,在轨迹跟踪误差方面比我们的先前工作减少了37.3%。 我们的方法通过模拟和现实世界实验进行了验证,展示了在狭小间隙穿越及多模式可变形任务中的有效性。
https://arxiv.org/abs/2505.15010
Evolution and learning have historically been interrelated topics, and their interplay is attracting increased interest lately. The emerging new factor in this trend is morphological evolution, the evolution of physical forms within embodied AI systems such as robots. In this study, we investigate a system of hexacopter-type drones with evolvable morphologies and learnable controllers and make contributions to two fields. For aerial robotics, we demonstrate that the combination of evolution and learning can deliver non-conventional drones that significantly outperform the traditional hexacopter on several tasks that are more complex than previously considered in the literature. For the field of Evolutionary Computing, we introduce novel metrics and perform new analyses into the interaction of morphological evolution and learning, uncovering hitherto unidentified effects. Our analysis tools are domain-agnostic, making a methodological contribution towards building solid foundations for embodied AI systems that integrate evolution and learning.
进化和学习在历史上一直是相互关联的主题,最近人们对它们的交互越来越感兴趣。这一趋势中的新兴因素是形态进化,即身体化人工智能系统(如机器人)内部物理形式的演变。在这项研究中,我们调查了一种具有可进化的形态和可学习控制器的六旋翼无人机系统,并对两个领域做出了贡献。 对于空中机器人技术,我们展示了进化与学习相结合能够产生非传统的无人机,在一些比以往文献中更复杂的任务上表现出显著优于传统六旋翼机的效果。而对于进化计算领域,我们引入了新的度量标准,并进行了一系列新分析,探讨形态进化和学习之间的相互作用,揭示了一些前所未有的影响。 我们的分析工具是领域无关的,为构建融合进化与学习的具身AI系统奠定了方法论基础,作出了贡献。
https://arxiv.org/abs/2505.14129
This paper presents Duawlfin, a drone with unified actuation for wheeled locomotion and flight operation that achieves efficient, bidirectional ground mobility. Unlike existing hybrid designs, Duawlfin eliminates the need for additional actuators or propeller-driven ground propulsion by leveraging only its standard quadrotor motors and introducing a differential drivetrain with one-way bearings. This innovation simplifies the mechanical system, significantly reduces energy usage, and prevents the disturbance caused by propellers spinning near the ground, such as dust interference with sensors. Besides, the one-way bearings minimize the power transfer from motors to propellers in the ground mode, which enables the vehicle to operate safely near humans. We provide a detailed mechanical design, present control strategies for rapid and smooth mode transitions, and validate the concept through extensive experimental testing. Flight-mode tests confirm stable aerial performance comparable to conventional quadcopters, while ground-mode experiments demonstrate efficient slope climbing (up to 30°) and agile turning maneuvers approaching 1g lateral acceleration. The seamless transitions between aerial and ground modes further underscore the practicality and effectiveness of our approach for applications like urban logistics and indoor navigation. All the materials including 3-D model files, demonstration video and other assets are open-sourced at this https URL.
本文介绍了Duawlfin,这是一种集轮式行驶和飞行操作于一体化的无人机驱动装置,能够实现高效双向地面移动。与现有的混合设计不同,Duawlfin通过仅使用标准四旋翼电机,并引入带有单向轴承的差速传动系统,消除了对额外执行器或螺旋桨驱动地面推进的需求。这一创新简化了机械系统,显著减少了能量消耗,并防止了在地面上旋转螺旋桨造成的干扰,例如尘土对传感器的影响。此外,单向轴承降低了地面模式下从电机到螺旋桨的动力传输,使该车辆能够在人类附近安全运行。我们提供了详细的机械设计,提出了快速平稳模式转换的控制策略,并通过广泛的实验测试验证了这一概念的有效性。飞行模式试验确认其空中性能与传统四旋翼无人机相当,而地面模式实验则展示了高效的爬坡能力(最大可达30°斜坡)和接近1g横向加速度的敏捷转弯动作。从空中的到地面之间无缝切换进一步证明了我们的方法在城市物流和室内导航等应用中的实用性和有效性。所有材料,包括3D模型文件、演示视频和其他资产,均可通过此链接(https URL)开放获取。
https://arxiv.org/abs/2505.13836
Sim-to-real transfer is a fundamental challenge in robot reinforcement learning. Discrepancies between simulation and reality can significantly impair policy performance, especially if it receives high-dimensional inputs such as dense depth estimates from vision. We propose a novel depth transfer method based on domain adaptation to bridge the visual gap between simulated and real-world depth data. A Variational Autoencoder (VAE) is first trained to encode ground-truth depth images from simulation into a latent space, which serves as input to a reinforcement learning (RL) policy. During deployment, the encoder is refined to align stereo depth images with this latent space, enabling direct policy transfer without fine-tuning. We apply our method to the task of autonomous drone navigation through cluttered environments. Experiments in IsaacGym show that our method nearly doubles the obstacle avoidance success rate when switching from ground-truth to stereo depth input. Furthermore, we demonstrate successful transfer to the photo-realistic simulator AvoidBench using only IsaacGym-generated stereo data, achieving superior performance compared to state-of-the-art baselines. Real-world evaluations in both indoor and outdoor environments confirm the effectiveness of our approach, enabling robust and generalizable depth-based navigation across diverse domains.
从模拟到现实的转移是机器人强化学习中的一个基本挑战。模拟与真实世界之间的差异可能会显著影响策略性能,尤其是在接收高维输入(如来自视觉系统的密集深度估计)时。我们提出了一种基于领域适应的新颖深度数据传输方法,旨在弥合仿真和实际环境之间深度数据的视觉差距。 首先训练一个变分自编码器(VAE),使其能够将仿真中的真实深度图像编码到潜在空间中,并将其作为强化学习(RL)策略的输入。在部署时,该编码器被进一步优化以使立体深度图与这一潜在空间对齐,从而可以在不进行微调的情况下直接迁移策略。 我们将这种方法应用于自主无人机穿越杂乱环境的任务上。在IsaacGym中的实验表明,当从真实深度图像切换到立体深度输入时,我们的方法几乎将障碍物规避成功率提高了一倍。此外,我们展示了将该方法成功迁移到使用仅通过IsaacGym生成的立体数据训练的照片级现实模拟器AvoidBench中,并在此过程中实现了优于现有最新基准的方法的表现。 在真实世界的室内和室外环境中进行的评估进一步证实了我们方法的有效性,证明它可以实现基于深度数据的稳健且通用化的导航,适用于各种领域。
https://arxiv.org/abs/2505.12428
Cross-view geo-localization (CVGL) aims to match images of the same geographic location captured from different perspectives, such as drones and satellites. Despite recent advances, CVGL remains highly challenging due to significant appearance changes and spatial distortions caused by viewpoint variations. Existing methods typically assume that cross-view images can be directly aligned within a shared feature space by maximizing feature similarity through contrastive learning. Nonetheless, this assumption overlooks the inherent conflicts induced by viewpoint discrepancies, resulting in extracted features containing inconsistent information that hinders precise localization. In this study, we take a manifold learning perspective and model the feature space of cross-view images as a composite manifold jointly governed by content and viewpoint information. Building upon this insight, we propose $\textbf{CVD}$, a new CVGL framework that explicitly disentangles $\textit{content}$ and $\textit{viewpoint}$ factors. To promote effective disentanglement, we introduce two constraints: $\textit{(i)}$ An intra-view independence constraint, which encourages statistical independence between the two factors by minimizing their mutual information. $\textit{(ii)}$ An inter-view reconstruction constraint that reconstructs each view by cross-combining $\textit{content}$ and $\textit{viewpoint}$ from paired images, ensuring factor-specific semantics are preserved. As a plug-and-play module, CVD can be seamlessly integrated into existing geo-localization pipelines. Extensive experiments on four benchmarks, i.e., University-1652, SUES-200, CVUSA, and CVACT, demonstrate that CVD consistently improves both localization accuracy and generalization across multiple baselines.
跨视角地理定位(CVGL)的目标是匹配来自不同视角拍摄的同一地理位置的照片,例如无人机和卫星图像。尽管近年来有所进展,但CVGL仍然非常具有挑战性,原因在于视点变化导致的显著外观变化和空间扭曲。现有方法通常假设可以通过对比学习最大化特征相似度来直接在共享特征空间中对齐跨视角图片。然而,这种假设忽视了由视点差异引起的内在冲突,这会导致提取的特征包含不一致的信息,从而妨碍精确定位。 本研究从流形学习的角度出发,将跨视角图像的特征空间建模为一个复合流形,该流形同时受内容和视点信息的共同支配。基于这一见解,我们提出了一种新的CVGL框架$\textbf{CVD}$(Content and Viewpoint Disentanglement),它明确地分离了“内容”和“视点”因素。 为了促进有效的分离,我们引入了两个约束条件: - (i) 一个视图内的独立性约束:通过最小化它们之间的互信息来鼓励这两个因素的统计独立。 - (ii) 跨视图重建约束:通过对配对图像中的内容和视点进行交叉组合来重构每个视图,确保保持特定于因子的语义。 作为即插即用模块,CVD可以无缝集成到现有的地理定位管道中。在四个基准测试(University-1652、SUES-200、CVUSA 和 CVACT)上进行了广泛的实验,结果显示 CVD 在多个基线上的定位准确性和泛化能力都有显著的提升。
https://arxiv.org/abs/2505.11822
Wildlife-induced crop damage, particularly from deer, threatens agricultural productivity. Traditional deterrence methods often fall short in scalability, responsiveness, and adaptability to diverse farmland environments. This paper presents an integrated unmanned aerial vehicle (UAV) system designed for autonomous wildlife deterrence, developed as part of the Farm Robotics Challenge. Our system combines a YOLO-based real-time computer vision module for deer detection, an energy-efficient coverage path planning algorithm for efficient field monitoring, and an autonomous charging station for continuous operation of the UAV. In collaboration with a local Minnesota farmer, the system is tailored to address practical constraints such as terrain, infrastructure limitations, and animal behavior. The solution is evaluated through a combination of simulation and field testing, demonstrating robust detection accuracy, efficient coverage, and extended operational time. The results highlight the feasibility and effectiveness of drone-based wildlife deterrence in precision agriculture, offering a scalable framework for future deployment and extension.
野生动物造成的农作物损害,特别是由鹿引起的损害,威胁到了农业生产力。传统的威慑方法往往在可扩展性、响应性和适应不同农田环境方面存在不足。本文提出了一种集成了无人驾驶飞行器(UAV)的自主野生动物驱赶系统,该系统是作为农场机器人挑战赛的一部分开发的。我们的系统结合了基于YOLO的实时计算机视觉模块用于鹿检测,一种节能的覆盖路径规划算法以实现高效的田地监测,以及一个自动充电站以支持无人机连续运行。在与明尼苏达州当地农民合作的基础上,该系统针对地形、基础设施限制和动物行为等实际约束进行了定制。通过模拟测试和实地测试相结合的方法对该解决方案进行了评估,展示了强大的检测准确性、高效的覆盖范围和延长的操作时间。 结果突显了基于无人机的野生动物驱赶技术在精准农业中的可行性和有效性,并为未来的部署和扩展提供了一个可扩展的框架。
https://arxiv.org/abs/2505.10770
Recent advancements in deep learning and aerial imaging have transformed wildlife monitoring, enabling researchers to survey wildlife populations at unprecedented scales. Unmanned Aerial Vehicles (UAVs) provide a cost-effective means of capturing high-resolution imagery, particularly for monitoring densely populated seabird colonies. In this study, we assess the performance of a general-purpose avian detection model, BirdDetector, in estimating the breeding population of Salvin's albatross (Thalassarche salvini) on the Bounty Islands, New Zealand. Using drone-derived imagery, we evaluate the model's effectiveness in both zero-shot and fine-tuned settings, incorporating enhanced inference techniques and stronger augmentation methods. Our findings indicate that while applying the model in a zero-shot setting offers a strong baseline, fine-tuning with annotations from the target domain and stronger image augmentation leads to marked improvements in detection accuracy. These results highlight the potential of leveraging pre-trained deep-learning models for species-specific monitoring in remote and challenging environments.
近期在深度学习和空中成像方面的进步已经彻底改变了野生动物监测,使得研究人员能够以前所未有的规模对野生动植物种群进行调查。无人驾驶飞行器(UAV)提供了一种成本效益高的方法来捕获高分辨率图像,特别是在监测密集分布的海鸟栖息地时尤为有效。在本研究中,我们评估了通用鸟类检测模型BirdDetector在估算新西兰博恩蒂群岛上的Salvin信天翁(Thalassarche salvini)繁殖种群中的性能。 通过使用无人机获取的影像数据,我们在零样本和微调设置下评估该模型的有效性,并结合增强推断技术和更强大的图像增广方法。我们的研究结果表明,在零样本应用中提供了一个强有力的基准,但通过对目标领域的注释进行微调并采用更强有力的图像处理技术,则能够显著提高检测精度。 这些发现强调了在偏远且具有挑战性的环境中利用预训练深度学习模型来实现特定物种监测的巨大潜力。
https://arxiv.org/abs/2505.10737
The fields of autonomous systems and robotics are receiving considerable attention in civil applications such as construction, logistics, and firefighting. Nevertheless, the widespread adoption of these technologies is hindered by the necessity for robust processing units to run AI models. Edge-AI solutions offer considerable promise, enabling low-power, cost-effective robotics that can automate civil services, improve safety, and enhance sustainability. This paper presents a novel Edge-AI-enabled drone-based surveillance system for autonomous multi-robot operations at construction sites. Our system integrates a lightweight MCU-based object detection model within a custom-built UAV platform and a 5G-enabled multi-agent coordination infrastructure. We specifically target the real-time obstacle detection and dynamic path planning problem in construction environments, providing a comprehensive dataset specifically created for MCU-based edge applications. Field experiments demonstrate practical viability and identify optimal operational parameters, highlighting our approach's scalability and computational efficiency advantages compared to existing UAV solutions. The present and future roles of autonomous vehicles on construction sites are also discussed, as well as the effectiveness of edge-AI solutions. We share our dataset publicly at this http URL
在民用领域,如建筑、物流和消防等应用中,自主系统和机器人技术正受到越来越多的关注。然而,这些技术的广泛应用受到了运行AI模型所需强大处理单元的需求限制。边缘人工智能(Edge-AI)解决方案提供了显著的希望,可以实现低功耗且成本效益高的机器人自动化公共服务,提高安全性并增强可持续性。 本文提出了一种新型基于无人机的监控系统,该系统利用边缘人工智能技术为建筑工地上的自主多机器人操作提供支持。我们的系统将轻量级微控制器(MCU)对象检测模型与定制构建的无人机平台集成,并通过5G网络实现多代理协调基础设施。我们特别关注了建筑环境中实时障碍物检测和动态路径规划问题,为此创建了一个专门用于基于MCU边缘应用的全面数据集。 实地实验展示了系统的实际可行性和最佳操作参数,并突出了我们的方法相比现有无人机解决方案在可扩展性及计算效率方面的优势。此外,本文还讨论了自主车辆在建筑工地上的当前与未来角色,以及边缘人工智能解决方案的有效性。我们公开分享该数据集,可在以下网址访问:[http URL]
https://arxiv.org/abs/2505.09837
Drones are promising for data collection in precision agriculture, however, they are limited by their battery capacity. Efficient path planners are therefore required. This paper presents a drone path planner trained using Reinforcement Learning (RL) on an abstract simulation that uses object detections and uncertain prior knowledge. The RL agent controls the flight direction and can terminate the flight. By using the agent in combination with the drone's flight controller and a detection network to process camera images, it is possible to evaluate the performance of the agent on real-world data. In simulation, the agent yielded on average a 78% shorter flight path compared to a full coverage planner, at the cost of a 14% lower recall. On real-world data, the agent showed a 72% shorter flight path compared to a full coverage planner, however, at the cost of a 25% lower recall. The lower performance on real-world data was attributed to the real-world object distribution and the lower accuracy of prior knowledge, and shows potential for improvement. Overall, we concluded that for applications where it is not crucial to find all objects, such as weed detection, the learned-based path planner is suitable and efficient.
无人机在精准农业中的数据收集方面很有前景,然而它们的电池容量限制了其使用。因此,需要高效的路径规划器来解决这一问题。本文介绍了一种基于强化学习(RL)训练的无人机路径规划器,该规划器是在一个抽象模拟环境中训练出来的,该环境利用物体检测和不确定的先验知识。RL代理控制飞行方向并可以终止飞行任务。通过将代理与无人机的飞行控制器及检测网络结合使用,并处理摄像头图像,可以在实际数据上评估代理的表现。 在模拟中,与全范围规划器相比,代理平均缩短了78%的飞行路径,但召回率降低了14%。而在真实世界的数据中,代理比全覆盖规划器减少了72%的飞行路径,但召回率下降到了25%。真实数据上的表现较差被归因于实际物体分布和先验知识准确性较低的问题,并显示出改进的可能性。 总的来说,我们得出结论认为,在不需要找到所有目标的应用场景下(例如杂草检测),基于学习的方法构建的路径规划器是合适且高效的。
https://arxiv.org/abs/2505.09278
Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for Robot-Assisted Drone Recovery on a Wavy Surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using an Error-State Kalman Filter (ESKF), and secondly, effective motion planning for a manipulator via Receding Horizon Control (RHC). Specifically, the ESKF predicts the drone's future position 0.5s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited torque constraints. We provide a system design that comprises a manipulator subsystem and a UAV subsystem. On the UAV side, we detail how position control and suspended payload strategies are implemented. On the manipulator side, we show how an RHC scheme outperforms traditional low-level control algorithms. Simulation and real-world experiments - using wave-disturbed motion data - demonstrate that our approach achieves a high success rate - above 95% and outperforms conventional baseline methods by up to 10% in efficiency and 20% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art (SOTA) performance and offers a practical solution for maritime drone operations.
在海上机器人技术中,从被波浪扰动的水面上回收无人机仍是一项重大挑战。本文提出了一种统一框架,用于在波涛表面进行机器人辅助无人机回收(Robot-Assisted Drone Recovery on a Wavy Surface),该框架解决了两个主要任务:首先,通过误差状态卡尔曼滤波器(ESKF)准确预测受到波浪扰动影响的移动无人机的位置;其次,利用后退地平线控制(RHC)为操作臂进行有效的运动规划。具体而言,ESKF可以提前0.5秒预测无人机的未来位置,而操作臂则实时规划捕捉轨迹,从而不仅克服了由波浪引起的基底运动,还解决了扭矩限制的问题。 我们的系统设计包括一个操作臂子系统和一个无人机子系统。在无人机侧,我们详细描述了如何实现位置控制和悬载负载策略;而在操作臂侧,我们展示了RHC方案如何超越传统的低级控制系统算法的性能表现。通过使用波浪扰动运动数据进行的仿真及真实世界实验表明,我们的方法能够达到高达95%的成功率,并且在效率和精度方面分别比传统基线方法高出最多10%和20%,从而证实了我们系统的可行性和鲁棒性。 该研究成果不仅展示了最先进的(SOTA)性能表现,还为海上无人机操作提供了一种实用的解决方案。
https://arxiv.org/abs/2505.09145
This paper proposes a low-overhead, vision-based 3D scene reconstruction framework for drones, named ExploreGS. By using RGB images, ExploreGS replaces traditional lidar-based point cloud acquisition process with a vision model, achieving a high-quality reconstruction at a lower cost. The framework integrates scene exploration and model reconstruction, and leverags a Bag-of-Words(BoW) model to enable real-time processing capabilities, therefore, the 3D Gaussian Splatting (3DGS) training can be executed on-board. Comprehensive experiments in both simulation and real-world environments demonstrate the efficiency and applicability of the ExploreGS framework on resource-constrained devices, while maintaining reconstruction quality comparable to state-of-the-art methods.
本文提出了一种低开销的基于视觉的无人机3D场景重建框架,命名为ExploreGS。通过使用RGB图像,ExploreGS用一种视觉模型替代传统的激光雷达点云获取过程,在降低成本的同时实现了高质量的重建效果。该框架集成了场景探索和模型重建,并利用了Bag-of-Words(BoW)模型来实现实时处理能力,因此可以在设备上直接执行3D Gaussian Splatting (3DGS)训练。在仿真环境和真实世界环境中进行的全面实验表明,在资源受限设备上,ExploreGS框架既高效又具有实用性,并且重建质量与最先进的方法相当。
https://arxiv.org/abs/2505.10578
This paper presents a Multi-Elevation Semantic Segmentation Image (MESSI) dataset comprising 2525 images taken by a drone flying over dense urban environments. MESSI is unique in two main features. First, it contains images from various altitudes, allowing us to investigate the effect of depth on semantic segmentation. Second, it includes images taken from several different urban regions (at different altitudes). This is important since the variety covers the visual richness captured by a drone's 3D flight, performing horizontal and vertical maneuvers. MESSI contains images annotated with location, orientation, and the camera's intrinsic parameters and can be used to train a deep neural network for semantic segmentation or other applications of interest (e.g., localization, navigation, and tracking). This paper describes the dataset and provides annotation details. It also explains how semantic segmentation was performed using several neural network models and shows several relevant statistics. MESSI will be published in the public domain to serve as an evaluation benchmark for semantic segmentation using images captured by a drone or similar vehicle flying over a dense urban environment.
这篇论文介绍了一个名为多高度语义分割图像(MESSI)的数据集,包含2525张由无人机在密集城市环境中飞行拍摄的照片。MESSI数据集具有两个独特的特征: 首先,它包含了不同高度的图片,这使得我们可以研究深度对语义分割的影响。 其次,该数据集包括来自多个不同城市的区域所拍照片(这些照片是在不同的高度上拍摄的)。这一点很重要,因为这种多样性涵盖了无人机三维飞行中捕捉到的视觉丰富性,包括水平和垂直动作。MESSI中的图像被标注了位置、方向以及相机的内部参数信息,并可用于训练深度神经网络进行语义分割或其他应用(如定位、导航和跟踪)。 本文详细描述了数据集并提供了注释详情。同时,还解释了使用几种不同的神经网络模型进行语义分割的过程,并展示了相关的统计数据。MESSI将在公共领域发布,用作评估在密集城市环境中飞行的无人机或类似车辆拍摄图像所进行的语义分割基准测试。
https://arxiv.org/abs/2505.08589
This paper audits damage labels derived from coincident satellite and drone aerial imagery for 15,814 buildings across Hurricanes Ian, Michael, and Harvey, finding 29.02% label disagreement and significantly different distributions between the two sources, which presents risks and potential harms during the deployment of machine learning damage assessment systems. Currently, there is no known study of label agreement between drone and satellite imagery for building damage assessment. The only prior work that could be used to infer if such imagery-derived labels agree is limited by differing damage label schemas, misaligned building locations, and low data quantities. This work overcomes these limitations by comparing damage labels using the same damage label schemas and building locations from three hurricanes, with the 15,814 buildings representing 19.05 times more buildings considered than the most relevant prior work. The analysis finds satellite-derived labels significantly under-report damage by at least 20.43% compared to drone-derived labels (p<1.2x10^-117), and satellite- and drone-derived labels represent significantly different distributions (p<5.1x10^-175). This indicates that computer vision and machine learning (CV/ML) models trained on at least one of these distributions will misrepresent actual conditions, as the differing satellite and drone-derived distributions cannot simultaneously represent the distribution of actual conditions in a scene. This potential misrepresentation poses ethical risks and potential societal harm if not managed. To reduce the risk of future societal harms, this paper offers four recommendations to improve reliability and transparency to decisio-makers when deploying CV/ML damage assessment systems in practice
本文对来自飓风伊安、迈克尔和哈维的15,814座建筑物的卫星和无人机航拍图像中推导出的损坏标签进行了审计,发现29.02%的标签存在不一致,并且两种来源之间的分布有显著差异。这在部署基于机器学习的建筑损害评估系统时带来了风险和潜在危害。 目前尚无已知的研究探讨了无人机与卫星影像间用于建筑物损伤评估的标签一致性问题。唯一可以用来推测此类图像导出标签是否一致的工作因不同的损坏标签体系、错位的建筑物位置以及数据量低而受到限制。本研究通过使用相同的损害标签体系和三个飓风期间的建筑位置来克服这些限制,15,814座建筑物的数量比最相关的先前工作高出近19.05倍。 分析结果显示,与无人机导出的标签相比,卫星导出的标签对损坏情况至少低估了20.43%(p<1.2x10^-117),并且卫星和无人机导出的标签代表的是显著不同的分布(p<5.1x10^-175)。这表明,基于这些不同分布训练出来的计算机视觉和机器学习(CV/ML)模型将无法准确反映实际情况。由于不可能同时用卫星和无人机生成的不同分布来表示场景中的实际状况分布,这种潜在的错误代表可能会带来伦理风险和社会危害。 为减少未来可能的社会损害,本文提出了四项建议,以提高在实践中部署CV/ML损伤评估系统时对决策者的可靠性和透明度。
https://arxiv.org/abs/2505.08117
Unmanned Aerial Vehicles (UAVs), also known as drones, have gained popularity in various fields such as agriculture, emergency response, and search and rescue operations. UAV networks are susceptible to several security threats, such as wormhole, jamming, spoofing, and false data injection. Time Delay Attack (TDA) is a unique attack in which malicious UAVs intentionally delay packet forwarding, posing significant threats, especially in time-sensitive applications. It is challenging to distinguish malicious delay from benign network delay due to the dynamic nature of UAV networks, intermittent wireless connectivity, or the Store-Carry-Forward (SCF) mechanism during multi-hop communication. Some existing works propose machine learning-based centralized approaches to detect TDA, which are computationally intensive and have large message overheads. This paper proposes a novel approach DATAMUt, where the temporal dynamics of the network are represented by a weighted time-window graph (TWiG), and then two deterministic polynomial-time algorithms are presented to detect TDA when UAVs have global and local network knowledge. Simulation studies show that the proposed algorithms have reduced message overhead by a factor of five and twelve in global and local knowledge, respectively, compared to existing approaches. Additionally, our approaches achieve approximately 860 and 1050 times less execution time in global and local knowledge, respectively, outperforming the existing methods.
无人驾驶飞行器(UAV,即无人机)在农业、应急响应和搜索救援等领域得到了广泛应用。然而,UAV网络面临着多种安全威胁,例如虫洞攻击、干扰、欺骗以及虚假数据注入等。时间延迟攻击(TDA)是一种独特的恶意行为,其中恶意UAV故意延迟数据包转发,在时效性要求高的应用场景中尤其具有严重威胁。由于无人机网络的动态特性、间歇性无线连接或在多跳通信中的存储-携带-转发(SCF)机制,区分恶意延迟与正常的网络延迟非常困难。 现有的一些研究提出了基于机器学习的集中式方法来检测TDA,但这些方法计算资源消耗大,并且消息开销也较大。本文提出了一种新颖的方法DATAMUt,在该方法中,通过加权时间窗口图(TWiG)表示网络的时间动态性,并进一步提供了两个确定性的多项式时间算法,用于在无人机拥有全局和局部网络知识的情况下检测TDA。 模拟研究显示,相较于现有技术,所提出的算法在全球和本地知识情况下分别减少了5倍和12倍的消息开销。同时,在全球和本地知识情境下,执行时间分别减少了约860倍和1050倍,从而超过了现有的方法。
https://arxiv.org/abs/2505.07670
In this paper, the dual-optical attention fusion crowd head point counting model (TAPNet) is proposed to address the problem of the difficulty of accurate counting in complex scenes such as crowd dense occlusion and low light in crowd counting tasks under UAV view. The model designs a dual-optical attention fusion module (DAFP) by introducing complementary information from infrared images to improve the accuracy and robustness of all-day crowd counting. In order to fully utilize different modal information and solve the problem of inaccurate localization caused by systematic misalignment between image pairs, this paper also proposes an adaptive two-optical feature decomposition fusion module (AFDF). In addition, we optimize the training strategy to improve the model robustness through spatial random offset data augmentation. Experiments on two challenging public datasets, DroneRGBT and GAIIC2, show that the proposed method outperforms existing techniques in terms of performance, especially in challenging dense low-light scenes. Code is available at this https URL
在这篇论文中,为了应对无人机视角下的人群计数任务在拥挤遮挡和低光等复杂场景下准确计数的难题,提出了一种双光学注意力融合人群头部点计数模型(TAPNet)。该模型设计了双光学注意力融合模块(DAFP),通过引入红外图像中的互补信息来提高全天候人群计数的准确性和鲁棒性。为了充分利用不同模态的信息,并解决由于图像对之间系统性偏移所导致的不精确定位问题,本文还提出了一种自适应两光学特征分解融合模块(AFDF)。此外,我们优化了训练策略,通过空间随机偏移数据增强来提高模型的鲁棒性。在两个具有挑战性的公开数据集DroneRGBT和GAIIC2上的实验表明,所提出的方法在性能上优于现有的技术,特别是在挑战性的密集低光场景中。 代码可在[此链接](https://this https URL)获取(注意:原文中的URL格式有误,实际使用时请确保提供正确的网址)。
https://arxiv.org/abs/2505.06937
Modern autopilot systems are prone to sensor attacks that can jeopardize flight safety. To mitigate this risk, we proposed a modular solution: the secure safety filter, which extends the well-established control barrier function (CBF)-based safety filter to account for, and mitigate, sensor attacks. This module consists of a secure state reconstructor (which generates plausible states) and a safety filter (which computes the safe control input that is closest to the nominal one). Differing from existing work focusing on linear, noise-free systems, the proposed secure safety filter handles bounded measurement noise and, by leveraging reduced-order model techniques, is applicable to the nonlinear dynamics of drones. Software-in-the-loop simulations and drone hardware experiments demonstrate the effectiveness of the secure safety filter in rendering the system safe in the presence of sensor attacks.
现代自动驾驶系统容易遭受传感器攻击,这会危及飞行安全。为了降低这种风险,我们提出了一种模块化解决方案:安全过滤器(secure safety filter),它在已建立的控制障碍函数(CBF)基础上进行了扩展,以应对并减轻传感器攻击的影响。该模块包含一个安全状态重构器(生成可能的状态)和一个安全过滤器(计算最接近名义值的安全控制输入)。与现有研究主要针对线性且无噪声系统不同,我们提出的安全过滤器能够处理有界测量噪声,并通过利用降阶模型技术适用于无人机的非线性动力学。软件在环模拟和无人机硬件实验展示了安全过滤器在传感器攻击存在时确保系统安全的有效性。
https://arxiv.org/abs/2505.06845
This study presents an advanced multi-view drone swarm imaging system for the three-dimensional characterization of smoke plume dispersion dynamics. The system comprises a manager drone and four worker drones, each equipped with high-resolution cameras and precise GPS modules. The manager drone uses image feedback to autonomously detect and position itself above the plume, then commands the worker drones to orbit the area in a synchronized circular flight pattern, capturing multi-angle images. The camera poses of these images are first estimated, then the images are grouped in batches and processed using Neural Radiance Fields (NeRF) to generate high-resolution 3D reconstructions of plume dynamics over time. Field tests demonstrated the ability of the system to capture critical plume characteristics including volume dynamics, wind-driven directional shifts, and lofting behavior at a temporal resolution of about 1 s. The 3D reconstructions generated by this system provide unique field data for enhancing the predictive models of smoke plume dispersion and fire spread. Broadly, the drone swarm system offers a versatile platform for high resolution measurements of pollutant emissions and transport in wildfires, volcanic eruptions, prescribed burns, and industrial processes, ultimately supporting more effective fire control decisions and mitigating wildfire risks.
这项研究提出了一种先进的多视角无人机群影像系统,用于对烟羽扩散动力学进行三维特征分析。该系统包括一个管理无人机和四架作业无人机,每架都配备了高分辨率相机和精确的GPS模块。管理无人机利用图像反馈自主检测并定位在烟羽上方,随后指挥作业无人机以同步圆形飞行模式围绕区域盘旋,捕捉多角度影像。 这些影像的照相姿态首先被估算出来,然后将影像分批处理,并使用神经辐射场(NeRF)进行处理,生成高分辨率的时间序列3D重建图像。实地测试表明该系统能够以约1秒的时间分辨率为关键烟羽特征如体积动力学、风向驱动的方向变化和扬升行为的捕捉提供支持。通过此系统的三维重建产生的数据为改进烟羽扩散预测模型以及火势蔓延提供了独特的现场数据。 总体而言,无人机群系统为在野火、火山爆发、计划燃烧及工业过程中污染物排放与传输的高分辨率测量提供了多功能平台,最终有助于更有效的火灾控制决策并减轻野火风险。
https://arxiv.org/abs/2505.06638
Accurate, real-time collision detection is essential for ensuring player safety and effective refereeing in high-contact sports such as rugby, particularly given the severe risks associated with traumatic brain injuries (TBI). Traditional collision-monitoring methods employing fixed cameras or wearable sensors face limitations in visibility, coverage, and responsiveness. Previously, we introduced a framework using unmanned aerial vehicles (UAVs) for monitoring and real time kinematics extraction from videos of collision events. In this paper, we show that the strategies operating on the objective of ensuring at least one UAV captures every incident on the pitch have an emergent property of fulfilling a stronger key condition for successful kinematics extraction. Namely, they ensure that almost all collisions are captured by multiple drones, establishing multi-view fidelity and redundancy, while not requiring any drone-to-drone communication.
准确的实时碰撞检测对于确保橄榄球等高接触运动中玩家的安全和有效的裁判至关重要,尤其是考虑到与创伤性脑损伤(TBI)相关的严重风险。传统的碰撞监测方法采用固定摄像头或可穿戴传感器,在可见度、覆盖范围和响应速度方面存在局限性。此前,我们介绍了一个使用无人驾驶飞机系统(UAVs)的框架,用于从碰撞事件视频中提取实时运动学数据。在本文中,我们展示了旨在确保至少一架无人机捕捉到场地内所有事件的战略具有一种新兴特性:能够满足成功提取运动学数据的关键条件。具体来说,这些策略确保几乎所有碰撞都被多架无人机捕获,从而建立多视角的准确性和冗余性,并且无需任何无人机之间的通信。
https://arxiv.org/abs/2505.06588